submission_id: 2026-06-04__002_container_shipping_throughput__claude-code__claude-opus-4-8__max-effort
date: 2026-06-04
benchmark_id: 002_container_shipping_throughput
harness:
name: claude-code
version: tbc
notes: vanilla
model:
name: claude-opus-4-8
vendor: anthropic
notes: 1M context
run_tag: max-effort
operator: harry
status: complete # scaffolded | running | complete | abandoned
intervention:
category: autonomous # autonomous | hints | manual_repair | failed | unrecorded
notes: >-
Built end-to-end from prompt.md with no hints or manual repair. SimPy DES
package (container_sim/), 7 scenarios (6 required + own rotterdam_upgrade),
30 reps each. Self-tests pass (reproducibility, seed independence, event-log
reconstruction). Identified Rotterdam discharge berth as the binding
constraint (util 65% vs canal 3%); canal_upgrade is a near-no-op (1.01x),
fleet saturates (marginal 21.7k->11.0k TEU/vessel), recommended a second
Rotterdam berth (+14% at fixed fleet).
submission.yaml
← Back to submission · View raw on GitHub