For journalists

v1.1.0

This page bundles the claims, context, and attribution that are safe to cite for trade media and general press.

What is InsureBench?

InsureBench is an independent benchmark initiative measuring how well AI models perform on Dutch Wft-Basis knowledge questions. The project publishes aggregated outcomes and methodology, not the private question bank itself.

What the benchmark measures

Phase 1 measures Wft-Basis knowledge under strict exam-style conditions using private practice questions and aggregated scoring.

What the benchmark does not measure

The current score is not evidence of advice suitability, compliance suitability, or safe deployment as a standalone AI adviser.

Safe claims to quote

InsureBench measures Wft-Basis knowledge of AI models in phase 1.
The benchmark uses 80 private practice questions based on CDFD learning objectives.
The score is not an official CDFD exam result.
The prompt benchmark for advice quality is still in development.
InsureBench publishes aggregated scores and keeps the question bank private to limit contamination.

Claims to avoid for now

This model gives the best insurance advice.
This model is suitable as an AI adviser.
This model complies with Wft requirements.
This benchmark is 100% reproducible.
This score proves a model can advise safely.

Top 3 findings

Claude Opus 4.7 ranks #1 with 34/40 (67/80 raw).
18 of 21 models clear the 68% CDFD pass threshold (86%).
Spread remains meaningful: Mistral: Mistral Nemo sits at 22/40, highlighting exam-set sensitivity across models.

Methodology in 5 lines

InsureBench tests publicly available text models on Wft-Basis knowledge.
Phase 1 uses 80 private practice questions distributed by CDFD task, question type, and difficulty.
Each published WFT score is based on multiple attempts and converted to a 40-point scale with a 68% pass threshold.
Question texts, answer options, and correct answers remain private to limit contamination and copyright risk.
Only aggregated scores, methodology, model metadata, and run metadata are published.

Press image and logo

Open press image Open logo

Recommended attribution

Use this source line as the default attribution in articles, presentations, and link posts.

Diks, M. (2026). InsureBench v1.1.0. https://www.insurebench.nl/en/voor-journalisten

Open changelog