2026-06-01__001_synthetic_mine_throughput__opencode__claude-opus-4-8__plan-mode-max-effort

Date: 2026-06-01 · Benchmark: 001_synthetic_mine_throughput · Harness: opencode · Model: claude-opus-4-8 (plan-mode-max-effort) · ✓ Autonomous

Scores

Category Points Max
Conceptual modelling 20 20
Data and topology 15 15
Simulation correctness 20 20
Experimental design 14 15
Results & interpretation 15 15
Code quality 9 10
Traceability 5 5
Total 98 100

Run metrics

Evaluation report

Scenario Mean throughput
baseline 12,953.3
trucks_4 7,753.3
trucks_12 13,033.3
ramp_upgrade 12,983.3
crusher_slowdown 6,513.3
ramp_closed 12,803.3
crusher_debottleneck 14,853.3

Source files

Downloads

Conceptual model

Conceptual Model — Synthetic Mine Throughput

A discrete-event simulation (DES) of ore haulage in a synthetic open-pit mine, built in SimPy. The model estimates ore tonnage delivered to the primary crusher over an 8-hour shift and the queueing behaviour of the haulage system.

This document is the conceptual model: it states what is modelled and why, independently of the code. The implementation lives in mine_sim.py (model) and run_experiment.py (experiment harness).


1. System boundary

Purpose / question. How much ore reaches the primary crusher (CRUSH) during one 8-hour shift, where are the bottlenecks, and how does throughput respond to fleet size, a ramp upgrade, a ramp closure, and a slower crusher?

Included in the model

Excluded (out of boundary)


2. Entities (things that move through the system)

EntityCountAttributes (from data)
Truck4 / 8 / 12 by scenariopayload_tonnes = 100, empty_speed_factor = 1.00, loaded_speed_factor = 0.85, start_node = PARK

Ore is not modelled as a separate entity; each completed dump moves a fixed truck payload (100 t) to the crusher, so tonnage is a counter incremented on dump completion. Trucks are the only active SimPy processes.


3. Resources (things that constrain the system)

ResourceServers (capacity)Service / hold timeSource
Loader L_N1load ~N(6.5, 1.2) minloaders.csv
Loader L_S1load ~N(4.5, 1.0) minloaders.csv
Crusher D_CRUSH1dump ~N(3.5, 0.8) mindump_points.csv
Ramp E03_UP, E03_DOWN1 eachedge travel timeedges.csv (capacity = 1)
Crusher approach E05_TO_CRUSH, E05_FROM_CRUSH1 eachedge travel timeedges.csv (capacity = 1)
Pit road E07_* (North), E09_* (South)1 eachedge travel timeedges.csv (capacity = 1)

Edges with capacity ≥ 999 are treated as unconstrained (free-flowing) and incur only a travel delay, not a resource request. A truck holds a constrained edge resource for the duration of its traversal, so only one truck occupies a single-lane segment at a time.


4. Events

The per-truck cycle generates the following discrete events (all logged for replication 0 of each scenario in event_log.csv):

  1. dispatch — truck is assigned to a loader (start of shift, or after a dump).
  2. enter_edge — truck begins traversing a road segment (one per edge on the shortest-time route; records from_node/to_node and, for constrained edges, the queue length).
  3. queue_loader — truck joins the loader queue (records queue length).
  4. load_start / load_end — loading begins / ends (payload acquired).
  5. enter_edge (loaded) — loaded haul toward the crusher.
  6. queue_crusher — truck joins the crusher dump queue (records queue length).
  7. dump_start / dump_end — dumping begins / ends. Tonnage is recorded on dump_end (a completed dump), which is the throughput measure.

The empty return to the next loader is the first leg of the next dispatch, so the cycle repeats until the shift clock (env.run(until = 480 min)) stops the simulation. Activities still in progress at the clock boundary are not counted, so only completed dumps contribute tonnage.


5. State variables

Per truck: current node/location; loaded vs empty; cumulative travel, load and dump time; cumulative time queued at loaders and at the crusher; timestamps of successive loadings (for cycle-time calculation).

Per resource: busy time (for utilisation); queue length (instantaneous) and queue waiting time (per request).

System: simulation clock; total tonnes delivered; number of completed dumps; committed-assignment counts per loader (used by the dispatcher).


6. Assumptions

6a. Derived from the data (facts the data dictates)

6b. Introduced by the modeller (choices not dictated by the data)

6c. Limitations

See summary.json → model_limitations. In brief: the ramp is off the loaded haul cycle on this topology (so ramp scenarios move throughput only a few percent); two-way roads are modelled as independent one-way lanes; there is no grade/rimpull haul-physics engine; loaders/crusher are simple single servers with no spotting time, breaks or breakdowns.


7. Performance measures

Computed per replication, then aggregated across replications with a 95% Student-t confidence interval (n = 30):

MeasureDefinition
total_tonnes_deliveredpayload × completed dumps at CRUSH within the shift
tonnes_per_hourtotal tonnes ÷ measured hours (8)
average_truck_cycle_time_minmean time between successive loadings of a truck
average_truck_utilisationmean over trucks of (travel + load + dump) ÷ shift
crusher_utilisationcrusher busy time ÷ (shift × crusher capacity)
loader_utilisation (per loader)loader busy time ÷ (shift × loader capacity)
road-segment utilisationconstrained-edge busy time ÷ (shift × edge capacity)
average_loader_queue_time_minmean wait in the loader queue per load
average_crusher_queue_time_minmean wait in the crusher queue per dump
top_bottlenecksresources ranked by mean utilisation, with mean queue time

Throughput is therefore an emergent result of the simulated load–haul–dump– return cycle and resource contention, never a static or closed-form calculation.

README

Synthetic Mine Throughput — SimPy Discrete-Event Simulation

A reproducible SimPy DES that estimates ore throughput to the primary crusher over an 8-hour shift, identifies bottlenecks, and answers the operator’s decision questions through scenario analysis.


1. Install dependencies

Python 3.11+ is required. From this folder:

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

The model itself needs only simpy, numpy, scipy, pyyaml, networkx. matplotlib is needed only for the optional topology plot.

2. Run the simulation

python3 run_experiment.py

This runs all six required scenarios plus one optional agent-proposed scenario, 30 replications each (≈1–2 s total), and (re)writes the deliverables into this folder:

FileContents
results.csvone row per (scenario, replication) with all metrics
summary.jsonper-scenario means, 95% CIs, bottleneck ranking, assumptions
event_log.csvfull event trace of replication 0 of each scenario
run_metrics.jsonself-timed wall-clock runtime / return code

Useful flags:

python3 run_experiment.py --scenarios baseline,ramp_closed   # subset
python3 run_experiment.py --replications 100                 # more reps
python3 run_experiment.py --log-replication 5                # log a different rep
python3 run_experiment.py --help

3. Reproduce the required scenario results

Results are deterministic. Replication r of every scenario uses seed = base_random_seed (12345) + r, so the same seed index is reused across scenarios (common random numbers) for paired comparison. Re-running python3 run_experiment.py reproduces results.csv / summary.json exactly on the same NumPy version. The optional topology figure is produced with:

python3 plot_topology.py        # writes topology.png

4. Conceptual model (summary)

Full detail is in conceptual_model.md. In brief: trucks are active SimPy processes that cycle load → haul → dump → return. Loaders, the crusher and the capacity-1 road segments (ramp, crusher approach, pit roads) are SimPy resources. Trucks start empty at PARK, are dispatched to a loader, then cycle between the pits and the crusher until the 480-minute shift clock stops the run. Tonnage is counted only on completed dump events at the crusher.

5. Main assumptions

The full list (split into data-derived and introduced) is in conceptual_model.md §6 and summary.json → key_assumptions. The decision- critical ones:

6. Routing and dispatching logic


7. Key results

30 replications per scenario, 8-hour shift. Tonnes are mean ± 95% CI.

ScenarioTonnes (95% CI)t/hCycle (min)Truck utilCrusher utilCrusher queue (min)
baseline (8 trucks)12,953 [12,872–13,035]1,61928.10.810.944.3
trucks_47,753 [7,728–7,779]96923.80.960.560.6
trucks_1213,033 [12,924–13,143]1,62940.80.560.9516.5
ramp_upgrade12,983 [12,911–13,056]1,62328.10.810.954.3
crusher_slowdown6,513 [6,441–6,586]81454.10.490.9527.5
ramp_closed12,803 [12,726–12,880]1,60028.30.800.944.6
crusher_debottleneck (proposed)14,853 [14,805–14,902]1,85724.70.910.540.05

8. Answers to the operational decision questions

Q1 — Expected baseline throughput?12,950 tonnes per 8-hour shift (1,619 t/h), 95% CI [12,872, 13,035], about 129–130 truck loads. The crusher runs at 94% utilisation, so the shift is close to the crusher’s practical ceiling (~13,700 t at zero idle).

Q2 — Likely bottlenecks? The primary crusher D_CRUSH is the binding constraint (94% utilisation, 4.3-minute average dump queue). Behind it sit the loaders (L_S 77%, L_N 70%) and then the crusher approach E05 (44%). The narrow ramp is not a steady-state bottleneck on this topology (≈3% utilisation) — see Q4/Q6.

Q3 — Do more trucks help, or does the system saturate? It saturates at the crusher. Going 4→8 trucks adds +5,200 t (+67%); going 8→12 adds only +80 t (+0.6%). At 12 trucks, truck utilisation collapses from 0.81 to 0.56 and the crusher queue rises to 16.5 min — the extra trucks simply wait in line. Eight trucks is already near the efficient fleet size; adding trucks does not buy crusher throughput.

Q4 — Would improving the narrow ramp help? Negligibly: +0.2% (12,983 vs 12,953, CIs overlap). The loaded haul never uses the ramp, so upgrading it only marginally speeds the South pit’s start-of-shift positioning. The ramp upgrade is not justified by throughput (it may still be worth it for safety, cycle-time variance or redundancy).

Q5 — Sensitivity to crusher service time? Very high — this is the dominant lever. Doubling the dump time (3.5 → 7.0 min) cuts throughput by 50% to 6,513 t and pushes the crusher queue to 27.5 min and cycle time to 54 min. Throughput tracks the crusher service rate almost one-for-one.

Q6 — Operational impact of losing the main ramp? Small for throughput: −1.2% (12,803 vs 12,953). The model reroutes onto the bypass automatically (all routes stay feasible). The real costs are start-of-shift delay (South-bound trucks take the longer bypass) and loss of route redundancy, not steady-state tonnage — because the ramp is off the loaded-haul cycle.

Optional proposed scenario — crusher_debottleneck. Adding a second crusher dump bay and approach lane lifts throughput +14.7% to 14,853 t and moves the bottleneck to the South loader L_S (utilisation 0.90). This is the highest-leverage intervention and confirms the crusher is what caps the baseline.

9. Likely bottlenecks (ranked, baseline)

  1. D_CRUSH — primary crusher, util 0.94, mean dump queue 4.3 min (binding).
  2. L_S — South loader, util 0.77.
  3. L_N — North loader, util 0.70.
  4. E05_TO_CRUSH — crusher approach, util 0.44 (all trucks funnel through it).
  5. South pit road E09_*, util ~0.40.

The ramp (E03) sits at ~0.03 utilisation — confirmed not a bottleneck here.

10. Limitations

See summary.json → model_limitations and conceptual_model.md §6c. Key points: the ramp is off the loaded-haul cycle on this topology (so ramp scenarios move throughput only a few percent); two-way single-lane roads are modelled as independent one-way lanes (understates head-on contention); there is no grade/rimpull haul-physics engine; loaders and the crusher are simple single servers with no spotting time, breaks or breakdowns; truck utilisation excludes queueing time. Results are conditional on these assumptions and should be read as decision-support, not absolute predictions.

11. Suggested improvements and further scenarios


Run notes / interventions

Built and validated in a single autonomous session (plan → implement → smoke test → 30-rep run → benchmark public tests + automated harness). Stochastic behaviour, seed control and ≥30 replications are all in place; the six required behavioural sanity checks pass. token_usage.json is left as unknown because the harness used here does not expose exact token counts; run_metrics.json is self-timed by run_experiment.py.

← Back to leaderboard