Leaderboard

A reproducible benchmark for AI-assisted discrete-event simulation work. Sort by Quality, Tokens, or Time — there is no combined score on purpose.

# Date Benchmark Harness Model Tag Quality / 100 Tokens Time Intervention Reviewer
1 2026-06-01 001_synthetic_mine_throughput opencode claude-opus-4-8 / plan-mode-max-effort plan-mode-max-effort 98 1s ✓ Autonomous unknown
2 2026-06-01 001_synthetic_mine_throughput claude-code claude-opus-4-8 / plan-mode-max-effort plan-mode-max-effort 97 3s ? Unrecorded unknown
3 2026-04-30 001_synthetic_mine_throughput claude-code claude-opus-4-7 / ouroboros-max-thinking ouroboros-max-thinking 97 ? Unrecorded unknown
4 2026-06-04 002_container_shipping_throughput claude-code claude-opus-4-8 / max-effort max-effort 96 2s ✓ Autonomous claude-opus-4-8
5 2026-04-29 001_synthetic_mine_throughput claude-code claude-opus-4-7 / plan-mode plan-mode 96 114k 11m 33s ✓ Autonomous unknown
6 2026-04-29 001_synthetic_mine_throughput claude-code claude-opus-4-7 / agent-teams-nelson-max-thinking agent-teams-nelson-max-thinking 95 ? Unrecorded unknown
7 2026-05-08 001_synthetic_mine_throughput claude-code claude-opus-4-7 / nelson-v2-2-2-max-thinking nelson-v2-2-2-max-thinking 94 ? Unrecorded unknown
8 2026-04-29 001_synthetic_mine_throughput claude-code claude-opus-4-7 / superpowers-max-thinking superpowers-max-thinking 94 239k 36m 07s ? Unrecorded unknown
9 2026-04-25 001_synthetic_mine_throughput claude-code claude-opus-4-7 / max-thinking max-thinking 92 117k 11m 39s ✓ Autonomous claude-opus-4-7
10 2026-05-19 001_synthetic_mine_throughput antigravity gemini-3-5-flash 89 ? Unrecorded unknown
11 2026-05-19 001_synthetic_mine_throughput antigravity gemini-3-5-flash / normal-thinking normal-thinking 88 ? Unrecorded unknown
12 2026-04-29 001_synthetic_mine_throughput claude-code claude-sonnet-4-6 / vanilla-max vanilla-max 85 76k 6s ? Unrecorded claude-opus-4-7
13 2026-04-25 001_synthetic_mine_throughput codex-cli gpt-5-5 / xhigh xhigh 85 503k 6m 40s ✓ Autonomous claude-opus-4-7
14 2026-04-29 001_synthetic_mine_throughput gemini-cli gemini-3-1-pro-preview / vanilla vanilla 80 ? Unrecorded unknown
15 2026-04-27 001_synthetic_mine_throughput gsd2 gemini-3-1-pro-preview / customtools customtools 75 ? Unrecorded claude-opus-4-7
16 2026-04-25 001_synthetic_mine_throughput pi-agent gemini-3-1-pro-preview / vanilla-customtools vanilla-customtools 73 99k 4m 57s ✓ Autonomous claude-opus-4-7
17 2026-05-01 001_synthetic_mine_throughput opencode gemini-3-flash-preview / vanilla vanilla 72 63k ? Unrecorded unknown
18 2026-04-29 001_synthetic_mine_throughput opencode gemini-3-1-pro-preview / customtools-high-superpowers-skill customtools-high-superpowers-skill 67 130k ? Unrecorded unknown