InsureBench
External validation of the Wft-Basis benchmark
This page explains how domain experts can review the benchmark without making the private question bank public.
Status
Public evidence layer live
External validation status
externe review in voorbereiding
Purpose of this page
external validation and scope boundaries
Why external validation matters
The Wft-Basis benchmark has been made publicly explainable, but a private question bank still requires external substantive review. External validation should show whether coverage, substantive correctness, and difficulty hold up beyond the internal team.
What experts can review
- Whether the question bank covers the relevant Wft-Basis learning objectives in a balanced way.
- Whether question content is substantively correct within the chosen legal reference date.
- Whether difficulty is reasonably aligned with the official CDFD exam.
- Whether public claims remain aligned with the actual evidence layer.
- Whether the boundary between knowledge benchmarking and advice suitability is sharp enough.
What experts cannot review here
- The full private question bank is not publicly shared.
- Raw model answers and internal review notes remain private.
- This page does not claim official CDFD approval or certification.
- This page does not claim official exam equivalence.
- This page does not prove that models can safely provide standalone insurance advice.
Intended external validation scope
- Review of question-bank coverage and learning-objective distribution.
- Review of substantive correctness within the chosen legal reference date.
- Review of difficulty and reasonableness relative to the official exam.
- Review of public interpretation, limitations, and claim boundaries.
Intended reviewer roles: Not publicly specified yet
Explicitly out of scope
- No request to formally approve the benchmark as a whole.
- No public release of the private question bank as a precondition for review.
- No endorsement of commercial suitability or advice safety.
Evidence available to reviewers
- Methodology — Full dataset card, run protocol, and error statistics.
- Methodology JSON — Machine-readable methodology export.
- Runs CSV — Public aggregated run data per benchmark round.
- Runs JSON — Machine-readable public run export by benchmark round.
- Data dictionary — Definitions of public fields and interpretation boundaries.
- Changelog — Traceability of release changes, limitations, and findings.
- Validation — Existing explanation of review, contamination, and governance.
- Roadmap — Outstanding validation steps and phase boundaries.
- Conflict-of-interest statement — Context on ownership and public accountability.
Outstanding validation steps
- Formally select the external reviewer or partner.
- Deliver a review pack with scoped review questions.
- Publish review date, reviewer role, and assessment scope.
- Summarize what was and was not externally reviewed.
- Reflect outcomes traceably in the changelog, methodology, and this page.
How experts can contribute
Contributions are welcome from Wft experts, compliance specialists, and exam-design experts. Input is most relevant for learning-objective coverage, substantive correctness, difficulty, and claim boundaries.
Registration or contact runs through the public contact point; relevant findings should then be reflected publicly and traceably in the methodology, changelog, or this page.
Contributions are meant as scrutiny, not endorsement.Boundaries of the current benchmark
- Closed models can change internally without full public visibility.
- A private set limits contamination, but does not eliminate it entirely.
- External validation is not legal or official certification.
- Wft-Basis phase 1 remains a knowledge benchmark, not proof of safe advice deployment.