2026-04-29__001_synthetic_mine_throughput__claude-code__claude-opus-4-7__plan-mode

Date: 2026-04-29 · Benchmark: 001_synthetic_mine_throughput · Harness: claude-code · Model: claude-opus-4-7 (plan-mode) · ✓ Autonomous

Scores

Category Points Max
Conceptual modelling 19 20
Data and topology 15 15
Simulation correctness 19 20
Experimental design 14 15
Results & interpretation 14 15
Code quality 10 10
Traceability 5 5
Total 96 100

Run metrics

Evaluation report

Scenario Mean throughput
baseline 13,143.333
trucks_4 7,983.333
trucks_12 13,783.333
ramp_upgrade 13,173.333
crusher_slowdown 7,236.667
ramp_closed 13,110

Source files

Downloads

Conceptual model

Conceptual model — synthetic mine throughput

This document describes the discrete-event simulation built for benchmark 001_synthetic_mine_throughput. The implementation is in src/ (model, simulation, experiment, analysis) and is driven by run.py. The model follows the data exactly where the data is unambiguous, and documents every assumption it has had to introduce.

1. System boundary

Included:

Excluded:

2. Entities

Trucks (T01 … T0N) are the only active entities. Each truck is a SimPy process carrying:

Ore payloads are carried by trucks rather than modelled as independent entities — the simulation only needs to track tonnes delivered, which is incremented at every dump_end event.

3. Resources

ResourceSimPy capacityNotes
L_N loader1North pit, mean load 6.5 min (slower face)
L_S loader1South pit, mean load 4.5 min (faster face)
D_CRUSH crusher1Mean dump 3.5 min (7.0 min in crusher_slowdown)
E03_UP1Narrow uphill ramp, designated bottleneck
E03_DOWN1Narrow downhill ramp
E05_TO_CRUSH1Crusher approach (inbound)
E05_FROM_CRUSH1Crusher approach (outbound)
E07_TO_LOAD_N1Single-lane north-pit face access (in)
E07_FROM_LOAD_N1Single-lane north-pit face access (out)
E09_TO_LOAD_S1Single-lane south-pit face access (in)
E09_FROM_LOAD_S1Single-lane south-pit face access (out)

All other roads have capacity 999 (declared in edges.csv). They are not modelled as SimPy resources — that would add bookkeeping overhead with no queueing realism. They are simple env.timeout(travel_min) segments.

4. Events

Logged event types (written to event_log.csv):

5. State variables

6. Assumptions

6.1 Data-derived assumptions

6.2 Introduced assumptions

6.3 Limitations

7. Performance measures

Reported per scenario (mean across 30 replications, ± 95 % CI by Student’s t-distribution with df = 29):

README

Synthetic Mine Throughput Simulation (Benchmark 001)

A discrete-event simulation in Python + SimPy of an 8-hour ore haulage shift on a small synthetic open-pit mine. Six scenarios are run with 30 replications each to answer six operational decision questions about ore throughput, bottlenecks, and infrastructure investment.

The simulation, conceptual model, and analysis are designed to be reproducible from a clean checkout: the only inputs are the CSVs and YAMLs in data/, and the only outputs are the four required artefacts plus an optional topology figure.


1. Install

pip install -r requirements.txt

Tested with Python 3.13 and SimPy 4.x.

2. Run

python run.py                              # all six scenarios, 30 reps each
python run.py --scenario baseline          # single scenario
python run.py --replications 2             # smoke test
python plot_topology.py                    # regenerate topology.png

Total runtime for the full sweep is ~2 s on a modern laptop.

Outputs written to the submission root:

3. Reproducibility

Each replication’s RNG is seeded from SHA-256(base_random_seed :: scenario_id :: replication_index) truncated to 64 bits. The base seed is read from baseline.yaml (simulation.base_random_seed = 12345) and inherited by every scenario. Per-rep seeds appear in results.csv under the random_seed column.

Re-running python run.py produces byte-identical results.csv and summary.json (timestamp-free outputs) on the same Python / SimPy / NumPy versions.

4. Conceptual model

See conceptual_model.md for the full conceptual model. Brief summary:

5. Routing and dispatching

Routing uses networkx.dijkstra_path over the directed graph with edge weight distance_m / (max_speed_kph × 1000 / 60) (minutes). Closed edges (closed = true in edges.csv after scenario overrides) are removed from the graph. Routes are recomputed from the current node at the start of every empty leg, so closures and per-scenario speed overrides take effect immediately. If a required route does not exist, the model raises RoutingError rather than silently producing a misleading result.

Dispatching follows the baseline nearest_available_loader policy with shortest_expected_cycle_time tie-breaker:

score(loader) = travel_time(current_node → loader_node)
              + queue_size(loader) × mean_load_time(loader)

The dispatcher picks the loader with the lowest score. Ties are broken by shorter expected return travel from loader to crusher.

This rule is queue-aware — it accounts for the busy time and queue at each loader, not just travel distance — and naturally balances load across both pits.

6. Key results

All values are means over 30 replications; bracketed values are 95 % confidence intervals using Student’s t-distribution (df = 29).

ScenarioTrucksTonnes (mean)Tonnes / hCycle (min)Crusher utilL_N utilL_S utilTruck utilLoader queue (min)Crusher queue (min)
baseline813 143 [13 089 – 13 198]1 64330.10.900.720.700.792.813.45
trucks_447 983 [7 945 – 8 021]99824.60.560.360.480.940.920.65
trucks_121213 783 [13 683 – 13 883]1 72343.70.930.760.730.574.1914.92
ramp_upgrade813 173 [13 125 – 13 221]1 64730.10.910.730.710.802.833.30
crusher_slowdown87 237 [7 154 – 7 320]90556.10.940.400.390.501.8026.61
ramp_closed813 110 [13 043 – 13 177]1 63930.20.900.720.710.792.803.41

Numbers come straight out of summary.json and can be reproduced with python run.py.

7. Answers to the operational decision questions

Q1. Expected ore throughput in the baseline 8-hour shift

~13 100 tonnes per shift, or about 1 640 t/h. The 95 % CI is narrow ([13 089 – 13 198]) because the crusher near-saturates and damps stochastic variation in upstream times.

Q2. Likely bottlenecks

summary.json → scenarios.<id>.top_bottlenecks ranks all resources by mean queue wait per scenario, drawn from the per-replication queue statistics.

Q3. Does adding more trucks materially improve throughput?

No — the system saturates near 8 trucks. Adding 4 trucks (4 → 8) adds 5 160 t (+65 %). Adding the next 4 trucks (8 → 12) adds only 640 t (+5 %). Truck utilisation collapses from 0.94 → 0.79 → 0.57 across the 4/8/12 cases, and the crusher’s queue grows from 0.7 to 14.9 min — the extra trucks simply queue at the crusher.

Q4. Would improving the narrow ramp help?

No, not under these scenarios. ramp_upgrade raises ramp speed and removes the capacity-1 constraint, but throughput is essentially unchanged (13 173 vs 13 143 t — within the 95 % CI overlap). The crusher is binding, so freeing the ramp does not unlock more throughput.

The ramp would only matter if (i) the fleet were small enough that travel time dominates, or (ii) the ramp’s capacity-1 constraint actually queued. Neither is the case in the six required scenarios.

Q5. Sensitivity to crusher service time

Very high. Doubling mean dump time from 3.5 → 7.0 min cuts throughput roughly in half (13 143 → 7 237 t, –45 %). This is the largest single- parameter sensitivity in the study and confirms the crusher is the binding resource. Crusher mean queue wait jumps from 3.4 to 26.6 min; loader utilisation falls from ~0.71 to ~0.39 because trucks back up behind the crusher rather than cycling.

Q6. Operational impact of losing the main ramp route

Negligible. With E03_UP and E03_DOWN closed, throughput drops by 0.25 % (13 143 → 13 110 t — within CI overlap). The bypass via J2 → J7 → J5 / J8 is already the shortest empty route from PARK and the loaded route does not depend on the ramp at all (LOAD_N → J5 → J3 → J4 → CRUSH uses upper haul roads, not the ramp). The ramp adds resilience in worse scenarios but, on these data, the bypass is a very close substitute.

8. Behavioural self-checks

The script prints (and the harness re-runs) six broad sanity checks. All six pass on the latest run:

[PASS] trucks_12_gt_trucks_4
[PASS] baseline_gt_trucks_4
[PASS] ramp_upgrade_ge_baseline
[PASS] crusher_slowdown_lt_baseline
[PASS] ramp_closed_le_baseline
[PASS] truck_count_saturation_plausible

9. Limitations

10. Suggested further scenarios

These are listed for the operator’s consideration and were not implemented in this submission.

11. Repository layout

.
├── conceptual_model.md
├── README.md
├── requirements.txt
├── run.py
├── plot_topology.py
├── src/
│   ├── __init__.py
│   ├── model.py        # data + scenario inheritance + graph + RNG helpers
│   ├── simulation.py   # SimPy resources, truck process, dispatcher, event log
│   ├── experiment.py   # replication driver + scenario sweep
│   └── analysis.py     # CIs, bottleneck identification, output writers
├── data/               # provided inputs (read-only)
│   ├── nodes.csv
│   ├── edges.csv
│   ├── trucks.csv
│   ├── loaders.csv
│   ├── dump_points.csv
│   └── scenarios/
│       ├── baseline.yaml
│       ├── trucks_4.yaml
│       ├── trucks_12.yaml
│       ├── ramp_upgrade.yaml
│       ├── crusher_slowdown.yaml
│       └── ramp_closed.yaml
├── results.csv         # generated
├── summary.json        # generated
├── event_log.csv       # generated
└── topology.png        # generated by plot_topology.py

← Back to leaderboard