The dashboard is a static site rebuilt from two sources of truth in the repository:
scores/scores.db— SQLite DB rebuilt fromscores/seed_scores.json.submissions/<id>/— one folder per run, holding code, results, conceptual model, and per-run metadata.
Pipeline
scores/scores.db ─┐
submissions/ ├─→ harness/build_dashboard.py ─→ dashboard/src/ ─→ astro build ─→ dist/ ─→ fly deploy
docs/methodology ┘ Where each leaderboard column comes from
- Quality —
scores.scores.total_score; sourced fromseed_scores.json. - Tokens —
token_usage.json.total_tokensper submission. Method (exact/reported/estimated/unknown) is shown on hover. - Time —
run_metrics.json.runtime_seconds. - Intervention —
submission.yaml.intervention.category.
The site is fully baked into a Caddy container — there is no runtime database, API, or auth surface. To rebuild:
make dashboard # rebuild from sources
make deploy # build + push to fly.io