submission.yaml

← Back to submission · View raw on GitHub

submission_id: 2026-06-04__002_container_shipping_throughput__claude-code__claude-opus-4-8__max-effort
date: 2026-06-04
benchmark_id: 002_container_shipping_throughput
harness:
  name: claude-code
  version: tbc
  notes: vanilla
model:
  name: claude-opus-4-8
  vendor: anthropic
  notes: 1M context
run_tag: max-effort
operator: harry
status: complete       # scaffolded | running | complete | abandoned
intervention:
  category: autonomous   # autonomous | hints | manual_repair | failed | unrecorded
  notes: >-
    Built end-to-end from prompt.md with no hints or manual repair. SimPy DES
    package (container_sim/), 7 scenarios (6 required + own rotterdam_upgrade),
    30 reps each. Self-tests pass (reproducibility, seed independence, event-log
    reconstruction). Identified Rotterdam discharge berth as the binding
    constraint (util 65% vs canal 3%); canal_upgrade is a near-no-op (1.01x),
    fleet saturates (marginal 21.7k->11.0k TEU/vessel), recommended a second
    Rotterdam berth (+14% at fixed fleet).