InsureBench
For journalists
v1.1.0This page bundles the claims, context, and attribution that are safe to cite for trade media and general press.
What is InsureBench?
InsureBench is an independent benchmark initiative measuring how well AI models perform on Dutch Wft-Basis knowledge questions. The project publishes aggregated outcomes and methodology, not the private question bank itself.
What the benchmark measures
Phase 1 measures Wft-Basis knowledge under strict exam-style conditions using private practice questions and aggregated scoring.
What the benchmark does not measure
The current score is not evidence of advice suitability, compliance suitability, or safe deployment as a standalone AI adviser.
Safe claims to quote
- InsureBench measures Wft-Basis knowledge of AI models in phase 1.
- The benchmark uses 80 private practice questions based on CDFD learning objectives.
- The score is not an official CDFD exam result.
- The prompt benchmark for advice quality is still in development.
- InsureBench publishes aggregated scores and keeps the question bank private to limit contamination.
Claims to avoid for now
- This model gives the best insurance advice.
- This model is suitable as an AI adviser.
- This model complies with Wft requirements.
- This benchmark is 100% reproducible.
- This score proves a model can advise safely.
Top 3 findings
- Claude Opus 4.7 ranks #1 with 34/40 (67/80 raw).
- 18 of 21 models clear the 68% CDFD pass threshold (86%).
- Spread remains meaningful: Mistral: Mistral Nemo sits at 22/40, highlighting exam-set sensitivity across models.
Methodology in 5 lines
- InsureBench tests publicly available text models on Wft-Basis knowledge.
- Phase 1 uses 80 private practice questions distributed by CDFD task, question type, and difficulty.
- Each published WFT score is based on multiple attempts and converted to a 40-point scale with a 68% pass threshold.
- Question texts, answer options, and correct answers remain private to limit contamination and copyright risk.
- Only aggregated scores, methodology, model metadata, and run metadata are published.
Press image and logo
Recommended attribution
Use this source line as the default attribution in articles, presentations, and link posts.
Diks, M. (2026). InsureBench v1.1.0. https://www.insurebench.nl/en/voor-journalisten