InsureBench — Wft-Basis AI benchmark

InsureBench

Leaderboard: AI models ranked by Wft-Basis score (40-point scale)
#	Model	Provider	Open source	Score (40)	WFT	Prompt	Price / M tokens	Result	Last tested
1	Mistral: Mistral NemoCombined	Mistralai	Open source	22 / 40 43/80 raw Groep A	22 / 40	19 / 40	€0.02 in / €0.03 out	Fail -5.2	01 May 2026
2	DeepSeek-R1

Data & citation

Download aggregated run data as CSV or JSON. Use the BibTeX entry below for attribution.

CSV JSON

@online{insurebench_wft_basis_1_1_0,
  title        = {InsureBench: Wft-Basis AI Benchmark},
  author       = {InsureBench},
  year         = {2026},
  version      = {1.1.0},
  url          = {https://www.insurebench.nl/nl/wft-basis},
  urldate      = {2026-04-23},
  note         = {Public leaderboard, 80 questions, 3 runs per model}
}

Which AI models understand Dutch Wft-Basis knowledge?

Key finding

Main limitation

Supported now

Not supported yet

Only testable later

Leaderboard

Release notes

What this round says

What changed since the previous round

Plausibly explainable outliers

What you still must not conclude

Quick datapoints from the current leaderboard

Data & citation