2026-04-29__001_synthetic_mine_throughput__claude-code__claude-opus-4-7__superpowers-max-thinking
Date: 2026-04-29 · Benchmark: 001_synthetic_mine_throughput · Harness: claude-code · Model: claude-opus-4-7 (superpowers-max-thinking) · ? Unrecorded
Scores
| Category | Points | Max |
|---|---|---|
| Conceptual modelling | 18 | 20 |
| Data and topology | 14 | 15 |
| Simulation correctness | 19 | 20 |
| Experimental design | 14 | 15 |
| Results & interpretation | 14 | 15 |
| Code quality | 10 | 10 |
| Traceability | 5 | 5 |
| Total | 94 | 100 |
Run metrics
-
Total tokens:
239000(method:reported) -
Input / output tokens:
—/— - Runtime:
2167 s -
Reviewer model:
unknown· harness:claude-code· on2026-04-29 - Recommendation: Strong submission
- Notes: Best-in-class structure: 7-module package, real pytest suite, honest residual-service-time disclosure; loses points only on absent warmup discussion.
Evaluation report
- Automated checks: 53 / 53 (100%)
- Behavioural checks: — / —
- Download full evaluation_report.json
| Scenario | Mean throughput |
|---|---|
| baseline | 12,256.667 |
| trucks_4 | 7,930 |
| trucks_12 | 12,620 |
| ramp_upgrade | 12,183.333 |
| crusher_slowdown | 6,526.667 |
| ramp_closed | 12,120 |
Source files
- README.md
- conceptual_model.md
- data/dump_points.csv
- data/edges.csv
- data/loaders.csv
- data/nodes.csv
- data/scenarios/baseline.yaml
- data/scenarios/crusher_slowdown.yaml
- data/scenarios/ramp_closed.yaml
- data/scenarios/ramp_upgrade.yaml
- data/scenarios/trucks_12.yaml
- data/scenarios/trucks_4.yaml
- data/trucks.csv
- prompt.md
- requirements.txt
- results/README.md
- results/conceptual_model.md
- results/evaluation_report.json
- results/results.csv
- results/summary.json
- run_metrics.json
- submission.yaml
- token_usage.json
Downloads
Conceptual model
Conceptual Model: Synthetic Mine Throughput
System boundary
Included: the truck haulage cycle from PARK to ore loaders (LOAD_N, LOAD_S) to the primary crusher (CRUSH) and back, traversing the road network defined in data/edges.csv. Loaders, the crusher, and capacity-constrained roads are modelled as constrained resources. The 8-hour shift is the simulated horizon.
Excluded: waste dump routing, the maintenance bay, breakdowns, refuelling, weather, blasting events, shift handover, operator skill variation, ore grade variation, and grade-resistance effects on truck speed beyond the loaded/empty speed factor.
Entities
- Trucks — active SimPy processes that cycle through the network. Each truck has a fixed payload, empty/loaded speed factors, and a starting node.
- Ore payload — implicit; each truck carries
payload_tonnesbetween loading and dumping.
Resources
- Loaders L_N (mean 6.5 min, sd 1.2) and L_S (mean 4.5 min, sd 1.0). Capacity 1 each.
- Primary crusher (mean 3.5 min, sd 0.8). Capacity 1.
- Paired bidirectional road locks (capacity 1 — one truck on the physical road regardless of direction):
- RAMP — E03_UP / E03_DOWN
- PIT_N — E07_TO_LOAD_N / E07_FROM_LOAD_N
- PIT_S — E09_TO_LOAD_S / E09_FROM_LOAD_S
- Per-direction crusher approach locks — E05_TO and E05_FROM (each capacity 1, treated as queueing lanes rather than a single physical road).
- All other edges have capacity 999 and are unconstrained.
Events
truck_dispatchedtraversal_started,road_lock_requested,road_lock_acquired,traversal_ended(per edge)loader_requested,loading_started,loading_endedcrusher_requested,dumping_started,dumping_ended
dumping_ended at CRUSH is the throughput-recording event.
State variables
- Per truck: current node, loaded flag, payload, cycle start time, cumulative travelling/loading/dumping minutes.
- Per resource: cumulative busy time, queue waits, queue lengths sampled on entry.
- Global: total tonnes delivered, simulation time.
Assumptions
Derived from data
- Loader and crusher service-time means/SDs from
loaders.csv/dump_points.csv. - Edge distances and speeds from
edges.csv. - Truck count from each scenario’s
fleet.truck_count; trucks selected in id-sorted order. - Capacity-1 edges treated as constrained; capacity-999 edges treated as unconstrained.
Introduced
- Capacity-1 ramp E03 and pit-access roads E07/E09 are modelled as paired bidirectional resources (one truck on the physical road regardless of direction). Crusher approach E05 keeps per-direction locks. The data’s
metadataforE03_DOWN(“same physical constraint simplified as separate edge”) supports the paired interpretation. - Loading and dumping times follow
Normal(mean, sd)truncated to[0.1 min, mean + 5 sd]. - Travel-time noise is multiplicative
Normal(1.0, cv=0.10)per truck per edge per traversal; effective speed floored at 10% of edge max_speed_kph to avoid pathological tails. - Routing uses pre-computed travel-time-weighted shortest paths via NetworkX Dijkstra (computed once per replication after applying scenario edge overrides).
- Loader choice is dynamic per cycle:
nearest_available_loaderwithshortest_expected_cycle_timetiebreaker, where expected cycle = travel_to + queue_count × load_mean + load_mean + travel_loaded + crusher_mean. - All trucks start at PARK at t=0 and are dispatched simultaneously.
Limitations
- No breakdowns, refuelling, shift handover, weather, or operator skill variation.
- No mid-cycle re-dispatching; loader choice is fixed at cycle start.
- Trucks finish their current state transition at shift end (no mid-traversal kill); in-progress dumps that complete after
shift_minutesare not counted. - Initial simultaneous dispatch may overstate first-cycle loader contention compared with staggered start-up in practice.
- The dispatcher’s queue-wait estimate uses
queue_count × load_mean; it does not account for the residual service time of the truck currently being loaded.
Performance measures
total_tonnes_delivered— cumulative tonnes viadumping_endedevents at CRUSH.tonnes_per_hour— total / shift length (8 h).average_truck_cycle_time_min— mean across all completed cycles, all trucks.average_truck_utilisation— mean across trucks of (travelling + loading + dumping) / shift.crusher_utilisation, per-loader utilisation (e.g.loader_L_N_utilisation,loader_L_S_utilisation).average_loader_queue_time_min— mean wait at any loader, averaged across loaders then across replications.average_crusher_queue_time_min.- 95% confidence intervals (Student-t, df = n-1) for headline metrics across replications.
top_bottlenecks— ranked list of resources byutilisation × avg_queue_wait_min(highest score first).
README
Mine Throughput Simulation
A SimPy discrete-event simulation of an open-pit mine haulage system. Estimates ore throughput to the primary crusher over an 8-hour shift across six required scenarios with 30 replications each.
Install
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
Run
All six required scenarios with 30 replications each (default):
python -m mine_sim.run
Single scenario or smoke test:
python -m mine_sim.run --scenario baseline
python -m mine_sim.run --replications 5
Outputs (in results/):
results.csv— one row per(scenario_id, replication)with all required columnssummary.json— per-scenario summary with 95% CIs, loader utilisation, ranked bottlenecksevent_log.csv— combined trace: full events for replication 0 of each scenario; onlydumping_endedevents for replications 1–N (keeps file size manageable while preserving end-to-end traceability for one canonical replication per scenario){scenario_id}__event_log.csv— full replication-0 trace per scenario
Total runtime is well under one minute on a modern laptop (~2.5 s for all 6 scenarios × 30 reps).
Reproduce
The simulation uses one numpy.random.Generator per replication, seeded from config["simulation"]["base_random_seed"] + replication_idx (12345 by default). Same seed → byte-identical event log. To verify:
python -m mine_sim.run --scenario baseline
cp results/baseline__event_log.csv /tmp/run1.csv
python -m mine_sim.run --scenario baseline
diff /tmp/run1.csv results/baseline__event_log.csv # must be empty
Conceptual model
See conceptual_model.md for the full system boundary, entities, resources, events, state, assumptions, and performance measures.
In short: trucks loop PARK → loader → crusher → loader → … . Loaders and the crusher are capacity-1 SimPy resources. The narrow ramp E03 and pit-access roads E07/E09 are paired bidirectional resources (one physical road, capacity 1 across both directions — supported by the dataset’s metadata note “same physical constraint simplified as separate edge”). E05 crusher approach has per-direction capacity-1 lanes. All other roads are unconstrained.
Main assumptions
- Service times:
Normal(mean, sd)truncated to[0.1 min, mean + 5 sd]. - Travel-time noise: multiplicative
Normal(1.0, cv=0.10)per truck per edge per traversal; effective speed floored at 10% of edge max speed. - Routing: travel-time-weighted Dijkstra paths, computed once per replication after applying scenario edge overrides (
closed: trueedges are dropped before graph build, so the bypass route emerges naturally forramp_closed). - Dispatching:
nearest_available_loaderwithshortest_expected_cycle_timetiebreaker. Decision once per cycle when the truck becomes idle. No mid-cycle re-routing. - Throughput attributed at
dumping_endedevents at CRUSH only; in-progress dumps at shift end are not counted. - All trucks start at PARK at t=0 and dispatch simultaneously.
Routing and dispatching
- Routing objective: shortest travel time. Edge weight =
(distance_m / 1000) / max_speed_kph * 60. Closed edges (per scenarioedge_overrides) are dropped before graph build. - All-pairs shortest paths are pre-computed once per replication via NetworkX. Reachability is validated for every (PARK, loader_node), (loader_node, CRUSH), and (CRUSH, loader_node) pair; if any required pair is unreachable the simulation aborts with a
TopologyError. - Dispatcher (per cycle): for each loader, compute expected cycle time =
travel_to_loader + queue_count × load_mean + load_mean + travel_to_crusher + crusher_mean. Pick the loader with the lowest expected cycle time, tie-broken alphabetically by loader id.
Key results
Headline tonnes/hour with 95% CI (Student-t, df=29) per scenario, from results/summary.json:
| Scenario | tonnes/h mean | 95% CI | total tonnes mean | avg cycle (min) | crusher util |
|---|---|---|---|---|---|
| baseline | 1,532.1 | (1,525.5, 1,538.6) | 12,257 | 30.4 | 0.895 |
| trucks_4 | 991.2 | (985.7, 996.8) | 7,930 | 23.6 | 0.579 |
| trucks_12 | 1,577.5 | (1,568.8, 1,586.2) | 12,620 | 43.6 | 0.927 |
| ramp_upgrade | 1,522.9 | (1,514.4, 1,531.4) | 12,183 | 30.6 | 0.886 |
| crusher_slowdown | 815.8 | (807.6, 824.0) | 6,527 | 55.5 | 0.945 |
| ramp_closed | 1,515.0 | (1,506.1, 1,523.9) | 12,120 | 30.7 | 0.886 |
Loader utilisation is asymmetric in every scenario: the dispatcher strongly prefers LOAD_S (faster service: 4.5 min vs 6.5 min) so L_S runs near 87 % under baseline while L_N is used as a spillover at ~46 %.
Answers to operational decision questions
1. Expected ore throughput in baseline 8-hour shift
~1,532 tonnes/hour (95 % CI 1,526 – 1,539), or ~12,257 tonnes per shift (CI 12,204 – 12,309) with 8 trucks. Average truck cycle time ~30.4 min; trucks are productive ~75 % of the shift.
2. Likely bottlenecks
The primary crusher dominates in all configurations except trucks_4. Baseline ranking by utilisation × avg_queue_wait_min:
| Resource | Utilisation | Avg queue wait (min) | Score |
|---|---|---|---|
| crusher | 0.895 | 2.82 | 2.52 |
| loader_L_S | 0.871 | 2.24 | 1.95 |
| road_PIT_S | 0.853 | 1.24 | 1.06 |
| loader_L_N | 0.458 | 1.87 | 0.86 |
The southern pit access road (PIT_S) ranks third — it is congested because the dispatcher routes most traffic to LOAD_S. The main ramp (E03) is not a binding constraint in any scenario at baseline truck counts (utilisation < 0.10 in every run except the trucks_12 scenario).
3. Does adding more trucks materially improve throughput?
No — the system saturates between 8 and 12 trucks. Marginal gain per added truck:
| Step | Δ tonnes/h | Δ tph per truck |
|---|---|---|
| 4 → 8 trucks | +540.9 | +135 |
| 8 → 12 trucks | +45.4 | +11 |
The 95 % CI for trucks_12 (1,569 – 1,586) barely separates from baseline (1,526 – 1,539); meanwhile crusher queue wait grows from 2.8 min at baseline to 13.1 min at 12 trucks, and average truck utilisation drops from 0.75 → 0.53. Operationally, ~8 trucks is near optimal under current crusher capacity.
4. Would improving the narrow ramp materially improve throughput?
No. ramp_upgrade (capacity 999, max speed 28 kph) gives 1,522.9 tph vs baseline 1,532.1 tph — the difference is within statistical noise (CIs overlap heavily; the ramp_upgrade mean is even slightly lower, which is plausible random variation). Ramp utilisation under baseline is < 10 %, so freeing it has nothing to free up. The crusher is the binding constraint.
5. How sensitive is throughput to crusher service time?
Highly sensitive. Doubling the crusher mean dump time from 3.5 → 7.0 min (crusher_slowdown) reduces throughput by 47 % (1,532 → 816 tph). Crusher utilisation rises to 0.94 and crusher queue wait jumps to 27 min — the crusher becomes a hard bottleneck and cycle time nearly doubles to 55.5 min. Any operational change that even modestly slows the crusher will cost throughput roughly proportional to the slowdown.
6. Operational impact of losing the main ramp route
Minor, ~1 %. ramp_closed (E03 unavailable; trucks reroute via the western bypass J2 → J7 → J8 → J4) gives 1,515 tph vs baseline 1,532 tph — about a 1 % drop, well within the bench’s CI for either scenario. The bypass adds a few minutes per cycle but the crusher remains the binding constraint, so cycle slack absorbs the rerouting cost. The model would still run safely without the main ramp.
Limitations
See conceptual_model.md for the full list. Headline limitations:
- No truck breakdowns, refuelling, shift handover, weather, or operator skill variation.
- No mid-cycle re-dispatch — once a truck picks a loader it commits to that loader.
- Trucks finish their current state transition at shift end; in-progress dumps that complete after
shift_minutesare not counted (so reported throughput is slightly conservative). - Initial dispatch from PARK is simultaneous, which may overstate first-cycle loader contention compared with realistic staggered start-up.
- The dispatcher’s queue-wait estimate uses
queue_count × load_mean; it ignores the residual service time of the truck currently being loaded. - Loader-overrides are supported but no required scenario uses them — so that pathway is exercised only by unit tests.
Suggested improvements / further scenarios
loader_n_upgrade— drop LOAD_N service time from 6.5 to 4.5 min (matching LOAD_S). Tests whether the slower northern loader is binding once the dispatcher can use it without penalty. Expected: would shift load to L_N, but only marginally because the crusher is still the binding constraint.crusher_capacity_2— model two parallel crusher lines. Likely the highest-leverage operational change given the bottleneck profile.- Staggered initial dispatch — model trucks departing PARK at small time offsets to reduce first-cycle contention.
- Truck reliability and refuelling — add a per-truck failure rate and a refuelling cycle via the maintenance bay node.
- Sensitivity sweep over
travel_time_noise_cv— currently fixed at 0.10; test 0.05 and 0.20 to bound the effect of travel-time uncertainty on the headline numbers.