InsureBench
Roadmap
v1.1.0 · 23 Apr 2026This page makes explicit what is already live, what is near-term priority, and which items remain in design or later phases.
What this roadmap is
A public expectation-setting framework. The roadmap shows how InsureBench builds toward more transparency, statistical sharpness, and advice relevance in phases, without making broader claims than the current evidence status supports.
| Phase | Goal | Status | Publication meaning |
|---|---|---|---|
| Fase 1 | Wft-Basis knowledge benchmark | Live | Publicly claimable now as a knowledge benchmark. |
| Fase 1.1 | Dataset card and stronger model cards | Priority | Increases transparency, without adding a new headline claim. |
| Fase 1.2 | Held-out set and statistical uncertainty | Priority | Improves evidence quality, without creating a new advice claim. |
| Fase 2 | Open advice cases for simple private-risk products | Design | No public evidence for advice quality yet. |
| Fase 2.1 | Expert panel and blind scoring | Design | Strengthens future validation of advice cases. |
| Fase 2.2 | Publication of example cases and rubrics | Design | Improves explainability, not equivalent to completed validation. |
| Fase 3 | Expansion to Wft Non-Life Personal and Health | Later | Domain expansion, not a current claim. |
| Fase 4 | External audit | Later | Future reinforcement of trust and reviewability. |
| Fase 5 | Quarterly reports for media and insurers | Later | Publication format, not extra validation by itself. |
What you can conclude now
- Phase 1 currently supports only a Wft-Basis knowledge claim.
- Roadmap items after phase 1 describe direction, not completed validation.
- Advice quality and external audit should only carry more weight after separate validation and publication.
What you should not conclude yet
- Phase 2 is already benchmark-ready.
- The roadmap proves AI models can already advise safely.
- The roadmap is a hard publication guarantee.
Live
Publicly visible and substantively claimable within the current benchmark boundaries.
Priority
Planned for the near term, but not yet a new public headline claim.
Design
Directionally important, but not mature enough yet for public evidence claims.
Later
Intentionally later phase without a hard publication date.