Leaderboard
A reproducible benchmark for AI-assisted discrete-event simulation work. Sort by Quality, Tokens, or Time — there is no combined score on purpose.
| # | Date | Benchmark | Harness | Model | Tag | Quality / 100 | Tokens | Time | Intervention | Reviewer |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2026-06-01 | 001_synthetic_mine_throughput | opencode | claude-opus-4-8 / plan-mode-max-effort | plan-mode-max-effort | 98 | — | 1s | ✓ Autonomous | unknown |
| 2 | 2026-06-01 | 001_synthetic_mine_throughput | claude-code | claude-opus-4-8 / plan-mode-max-effort | plan-mode-max-effort | 97 | — | 3s | ? Unrecorded | unknown |
| 3 | 2026-04-30 | 001_synthetic_mine_throughput | claude-code | claude-opus-4-7 / ouroboros-max-thinking | ouroboros-max-thinking | 97 | — | — | ? Unrecorded | unknown |
| 4 | 2026-06-04 | 002_container_shipping_throughput | claude-code | claude-opus-4-8 / max-effort | max-effort | 96 | — | 2s | ✓ Autonomous | claude-opus-4-8 |
| 5 | 2026-04-29 | 001_synthetic_mine_throughput | claude-code | claude-opus-4-7 / plan-mode | plan-mode | 96 | 114k | 11m 33s | ✓ Autonomous | unknown |
| 6 | 2026-04-29 | 001_synthetic_mine_throughput | claude-code | claude-opus-4-7 / agent-teams-nelson-max-thinking | agent-teams-nelson-max-thinking | 95 | — | — | ? Unrecorded | unknown |
| 7 | 2026-05-08 | 001_synthetic_mine_throughput | claude-code | claude-opus-4-7 / nelson-v2-2-2-max-thinking | nelson-v2-2-2-max-thinking | 94 | — | — | ? Unrecorded | unknown |
| 8 | 2026-04-29 | 001_synthetic_mine_throughput | claude-code | claude-opus-4-7 / superpowers-max-thinking | superpowers-max-thinking | 94 | 239k | 36m 07s | ? Unrecorded | unknown |
| 9 | 2026-04-25 | 001_synthetic_mine_throughput | claude-code | claude-opus-4-7 / max-thinking | max-thinking | 92 | 117k | 11m 39s | ✓ Autonomous | claude-opus-4-7 |
| 10 | 2026-05-19 | 001_synthetic_mine_throughput | antigravity | gemini-3-5-flash | 89 | — | — | ? Unrecorded | unknown | |
| 11 | 2026-05-19 | 001_synthetic_mine_throughput | antigravity | gemini-3-5-flash / normal-thinking | normal-thinking | 88 | — | — | ? Unrecorded | unknown |
| 12 | 2026-04-29 | 001_synthetic_mine_throughput | claude-code | claude-sonnet-4-6 / vanilla-max | vanilla-max | 85 | 76k | 6s | ? Unrecorded | claude-opus-4-7 |
| 13 | 2026-04-25 | 001_synthetic_mine_throughput | codex-cli | gpt-5-5 / xhigh | xhigh | 85 | 503k | 6m 40s | ✓ Autonomous | claude-opus-4-7 |
| 14 | 2026-04-29 | 001_synthetic_mine_throughput | gemini-cli | gemini-3-1-pro-preview / vanilla | vanilla | 80 | — | — | ? Unrecorded | unknown |
| 15 | 2026-04-27 | 001_synthetic_mine_throughput | gsd2 | gemini-3-1-pro-preview / customtools | customtools | 75 | — | — | ? Unrecorded | claude-opus-4-7 |
| 16 | 2026-04-25 | 001_synthetic_mine_throughput | pi-agent | gemini-3-1-pro-preview / vanilla-customtools | vanilla-customtools | 73 | 99k | 4m 57s | ✓ Autonomous | claude-opus-4-7 |
| 17 | 2026-05-01 | 001_synthetic_mine_throughput | opencode | gemini-3-flash-preview / vanilla | vanilla | 72 | 63k | — | ? Unrecorded | unknown |
| 18 | 2026-04-29 | 001_synthetic_mine_throughput | opencode | gemini-3-1-pro-preview / customtools-high-superpowers-skill | customtools-high-superpowers-skill | 67 | 130k | — | ? Unrecorded | unknown |