submission_id: 2026-06-01__001_synthetic_mine_throughput__opencode__claude-opus-4-8__plan-mode-max-effort
date: 2026-06-01
benchmark_id: 001_synthetic_mine_throughput
harness:
name: opencode
version: tbc
notes: ""
model:
name: claude-opus-4-8
vendor: anthropic
notes: max effort, plan mode
run_tag: plan-mode-max-effort
operator: harry
status: complete
intervention:
category: autonomous
notes: >-
Built and validated in a single session with no human code fixes or
repairs. Three planning questions were asked and answered with the
agent-recommended defaults (model the ramp faithfully; include the optional
crusher_debottleneck scenario; have run_experiment.py self-populate
run_metrics.json). These were scoping confirmations, not corrective hints.
All 5 public tests and 57/57 automated harness checks pass, including the
six behavioural checks.
submission.yaml
← Back to submission · View raw on GitHub