submission.yaml — 2026-06-01__001_synthetic_mine_throughput__opencode__claude-opus-4-8_

submission_id: 2026-06-01__001_synthetic_mine_throughput__opencode__claude-opus-4-8__plan-mode-max-effort
date: 2026-06-01
benchmark_id: 001_synthetic_mine_throughput
harness:
  name: opencode
  version: tbc
  notes: ""
model:
  name: claude-opus-4-8
  vendor: anthropic
  notes: max effort, plan mode
run_tag: plan-mode-max-effort
operator: harry
status: complete
intervention:
  category: autonomous
  notes: >-
    Built and validated in a single session with no human code fixes or
    repairs. Three planning questions were asked and answered with the
    agent-recommended defaults (model the ramp faithfully; include the optional
    crusher_debottleneck scenario; have run_experiment.py self-populate
    run_metrics.json). These were scoping confirmations, not corrective hints.
    All 5 public tests and 57/57 automated harness checks pass, including the
    six behavioural checks.