InsureBench — Wft-Basis AI benchmark

InsureBench

Leaderboard: AI models ranked by Wft-Basis score (40-point scale)
#	Model	Provider	Open source	Score (40)	WFT	Prompt	Price / M tokens	Result	Last tested
1	Anthropic: Claude Opus 4.8 (Fast)	Anthropic	—	35 / 40 66/80 raw Groep A	33 / 40	36 / 40	€9.20 in / €46.00 out	Pass +7.8	01 Jun 2026
2	Gemini 3.1 Pro Preview	Google	—	34 / 40 68/80 raw Groep B

Data & citation

Download aggregated run data as CSV or JSON. Use the BibTeX entry below for attribution.

CSV JSON

@online{insurebench_wft_basis_1_1_0,
  title        = {InsureBench: Wft-Basis AI Benchmark},
  author       = {InsureBench},
  year         = {2026},
  version      = {1.1.0},
  url          = {https://www.insurebench.nl/nl/wft-basis},
  urldate      = {2026-04-23},
  note         = {Public leaderboard, 80 questions, 3 runs per model}
}

Which AI models understand Dutch Wft-Basis knowledge?

Key finding

Main limitation

Supported now

Not supported yet

Only testable later

Leaderboard

Release notes

What this round says

What changed since the previous round

Plausibly explainable outliers

What you still must not conclude

Quick datapoints from the current leaderboard

Data & citation