2026-06-01__001_synthetic_mine_throughput__claude-code__claude-opus-4-8__plan-mode-max-effort

Date: 2026-06-01 · Benchmark: 001_synthetic_mine_throughput · Harness: claude-code · Model: claude-opus-4-8 (plan-mode-max-effort) · ? Unrecorded

Scores

Category Points Max
Conceptual modelling 19 20
Data and topology 15 15
Simulation correctness 19 20
Experimental design 14 15
Results & interpretation 15 15
Code quality 10 10
Traceability 5 5
Total 97 100

Run metrics

Evaluation report

Scenario Mean throughput
baseline 12,546.667
trucks_4 7,650
trucks_12 12,906.667
ramp_upgrade 12,606.667
crusher_slowdown 6,513.333
ramp_closed 12,363.333
trucks_12_ramp_upgrade 12,953.333

Source files

Downloads

Conceptual model

Conceptual Model — Synthetic Mine Ore Haulage (Benchmark 001)

A discrete-event simulation (SimPy) of an 8-hour ore haulage shift in a synthetic open-pit mine. It estimates ore throughput to the primary crusher and quantifies the effect of fleet size, ramp capacity, and crusher service time. This document is the design record; numeric results live in summary.json / results.csv and are discussed in README.md.


1. System boundary

Included

Excluded (out of scope)


2. Entities

EntityRoleSource
TruckThe only active (moving) entity. One SimPy process per truck runs the haul cycle: choose loader → travel empty → load → travel loaded → dump → repeat.trucks.csv (T01–T12; first N used per scenario)
Ore payloadNot a separate process. Carried implicitly by a loaded truck and credited as payload_tonnes (100 t) at each completed dump.trucks.csv

All trucks are homogeneous: 100 t payload, empty speed factor 1.00, loaded speed factor 0.85, and they all start at PARK.


3. Resources (what constrains the system)

Every constrained resource is a SimPy Resource with capacity 1 (one truck served at a time, others queue FIFO).

ResourceCountService parameters
Loaders L_N, L_S2truncated-normal load time; L_N 6.5 / 1.2 min, L_S 4.5 / 1.0 min
Crusher D_CRUSH1truncated-normal dump time 3.5 / 0.8 min (7.0 / 1.5 in crusher_slowdown)
Capacity-1 edges8one resource per directed single-lane segment

The eight capacity-1 edges (the only capacity = 1 rows in edges.csv) are:

Each physical direction is modelled as its own capacity-1 resource (e.g. E03_UP and E03_DOWN are independent), so opposing trucks do not contend for a single shared lane. This is a documented simplification (see §6). All other edges have capacity = 999 and are modelled as plain time delays (no resource contention).


4. Events

The per-truck cycle generates these events (all recorded to event_log.csv):

  1. dispatch — truck released at PARK at t = 0.
  2. edge_enter / edge_leave — acquiring / releasing each capacity-1 edge along a route (brackets the time the truck holds the single lane).
  3. arrive_loader — truck reaches the chosen loading face and joins its queue.
  4. start_load / end_load — loader service begins / ends.
  5. depart_loader — truck leaves loaded.
  6. arrive_crusher — truck reaches the crusher and joins its queue.
  7. start_dump / end_dump — crusher service begins / ends. Tonnes are credited at end_dump (and only if it closes before the shift cut).
  8. depart_crusher — truck leaves empty; the loop repeats.

Free-flow (capacity-999) edges advance time with a plain timeout and emit no edge events, keeping the log focused on the constrained segments.


5. State variables

Tracked during the run:

Derived at end-of-shift (see §7).


6. Assumptions

Derived from the data

Introduced (modelling choices not dictated by the data)

Limitations


7. Performance measures

Per replication (see metrics.py), then aggregated across 30 replications with a Student-t (n − 1 = 29) 95% confidence interval (see aggregate.py):

MeasureDefinition
total_tonnes_deliveredpayload × completed dumps before the cut
tonnes_per_hourtotal_tonnes_delivered / 8
average_truck_cycle_time_minmean completed-cycle duration (first cycle dispatch → end_dump, then end_dump → end_dump)
average_truck_utilisationmean over trucks of productive time / 480
crusher_utilisation, loader_utilisationbusy time / 480
average_loader_queue_time_min, average_crusher_queue_time_minmean wait per service
top_bottlenecksloaders + crusher + capacity-1 edges ranked by the composite score utilisation × mean queue wait (top 5)

The composite bottleneck score deliberately combines how busy a resource is with how long trucks wait for it, so a resource that is occasionally used but causes long waits (the ramp) is separated from one that is the true throughput ceiling (the crusher). See README.md for the interpretation.

README

Synthetic Mine Throughput Simulation (Benchmark 001)

A genuine discrete-event simulation in SimPy of an 8-hour ore-haulage shift in a synthetic open-pit mine. It estimates ore throughput to the primary crusher and answers six operational decision questions about fleet size, the narrow ramp, and crusher service time.

Headline result: under the baseline 8-truck configuration the mine delivers ≈ 12,547 t/shift (1,568 t/h), 95% CI [12,491, 12,602] t, and the system is crusher-bound — not ramp-bound.

Mine topology


1. Installation

Python 3.11+ (developed and tested on 3.13). From this submission folder:

pip install -r requirements.txt

Dependencies (all from the allowed list): simpy, numpy, pandas, scipy, matplotlib, networkx, PyYAML. pytest is needed only for the test suite. Pillow ships transitively with matplotlib for the GIF writer.

The code is a package under src/mine_sim/. Either install it (pip install -e .) or prefix commands with PYTHONPATH=src.


2. Running the simulation

# Produce all deliverables (7 scenarios × 30 replications) at the folder root:
PYTHONPATH=src python -m mine_sim run-all

# One scenario (quick smoke test):
PYTHONPATH=src python -m mine_sim run baseline --reps 5

# List available scenarios:
PYTHONPATH=src python -m mine_sim list

# Render topology.png + animation.gif from the event log:
PYTHONPATH=src python -m mine_sim render

run-all writes the three machine-readable artefacts — results.csv, event_log.csv, summary.json — directly to the submission root. Useful flags: --reps N (override replication count), --output-dir DIR, --event-log-scope {first,all}.

Reproducing the required scenario results

PYTHONPATH=src python -m mine_sim run-all          # 30 reps each, the canonical run
PYTHONPATH=src python -m pytest -q                 # 78 focused tests

Reproducibility is exact: the per-replication seed is base_random_seed (12345) + replication_index, drawn through independent numpy SeedSequence streams, so any (scenario, replication) reproduces bit-for-bit regardless of run order. event_log.csv defaults to --event-log-scope first (replication 0 of each scenario) to stay small and inspectable; all 30 replications feed the metrics and confidence intervals.


3. Conceptual model (summary)

Full detail is in conceptual_model.md. In brief:


4. Main assumptions

The full split of data-derived vs introduced assumptions and limitations is in conceptual_model.md §6.


5. Routing and dispatching logic

The asymmetric-ramp finding (important)

Running Dijkstra on the real graph shows that in the baseline (ramp open):

So the ramp carries very little ore-cycle traffic. This is why upgrading or closing it has only a small, asymmetric effect — and why the model’s behaviour is the opposite of the naïve “narrow ramp = main bottleneck” intuition. When the ramp is closed, the bypass keeps every face reachable (PARK → LOAD_S reroutes J2 → J7 → J8 → J6; CRUSH → PARK via J4 → J8 → J7 → J2).


6. Key results

30 replications per scenario, 8-hour shift. Full numbers and CIs in summary.json; per-replication rows in results.csv.

ScenarioTrucksTonnes / shiftt / hCrusher utilCrusher queue (min)
trucks_447,6509560.560.7
baseline812,5471,5680.913.3
trucks_121212,9071,6130.9414.2
ramp_upgrade812,6071,5760.923.3
crusher_slowdown86,5138140.9526.6
ramp_closed812,3631,5450.903.2
trucks_12_ramp_upgrade1212,9531,6190.9414.3

Baseline 95% CIs: total tonnes [12,491, 12,602], t/h [1,561, 1,575].


7. Answers to the operational decision questions

  1. Expected baseline throughput?12,547 t/shift (1,568 t/h), 95% CI [12,491, 12,602] t. Truck utilisation is ~99% and crusher utilisation ~91%.

  2. Likely bottlenecks? The primary crusher (D_CRUSH) is the dominant constraint (utilisation 0.91, composite score ≈ 2.99), then loader L_S (0.80) and L_N (0.60). The narrow ramp is not a system bottleneck — although E03_UP shows a long per-traversal wait (~11 min) on the rare occasions it is used, its utilisation is only ~5%, so it does not gate throughput.

  3. Do more trucks materially help, or does the system saturate? It saturates. Going 4 → 8 trucks adds +4,897 t (+64%), but 8 → 12 adds only +360 t (+2.9%) while the crusher queue more than quadruples (3.3 → 14.2 min). Beyond ~8 trucks the crusher is the ceiling and extra trucks mostly wait.

  4. Would improving the narrow ramp materially help? Noramp_upgrade lifts throughput by only ~+60 t (+0.5%), within noise of the baseline. The loaded legs to the crusher never use the ramp, so speeding it up barely matters. Ramp investment is not justified by throughput.

  5. How sensitive is throughput to crusher service time? Very. Raising mean dump time from 3.5 → 7.0 min cuts throughput from 12,547 → 6,513 t (−48%) and drives the crusher queue to ~27 min. The crusher is the binding resource, so its service rate maps almost one-for-one onto throughput.

  6. Operational impact of losing the main ramp? Modest and absorbable. ramp_closed still delivers 12,363 t (−1.5%) because the bypass keeps every face reachable and the loaded legs never used the ramp anyway. Losing the ramp is an inconvenience for the empty PARK → LOAD_S leg and the end-of-shift return, not a production emergency.

Bottom line for the operator: spend on crusher capacity/throughput, not on the ramp or on a bigger truck fleet. The fleet is already near the crusher-bound knee at 8 trucks, and the ramp is a red herring.


8. Bottlenecks (how they are ranked)

Resources are ranked by a composite score utilisation × mean queue wait, which separates the throughput ceiling (high utilisation) from occasional choke points (high per-event wait). Baseline ranking:

RankResourceKindUtilisationMean queue waitScore
1D_CRUSHcrusher0.913.282.99
2L_Sloader0.802.451.97
3L_Nloader0.602.621.58
4E03_UPedge (ramp)0.0510.890.57
5E05_TO_CRUSHedge0.420.150.06

The ramp’s high per-traversal wait but tiny utilisation (rank 4) is exactly the asymmetric-ramp finding in numbers. Under crusher_slowdown the crusher’s score jumps to ~25, dwarfing everything else.


9. Limitations

Summarised here, detailed in conceptual_model.md §6:

These mean the figures are best read as a fully-available, well-dispatched upper bound; real throughput would be somewhat lower.


10. Suggested improvements / further scenarios


11. Project layout

src/mine_sim/
  events.py         # event-log row schema (header source of truth)
  rng.py            # reproducible seeds + distributions
  scenarios.py      # YAML load + inheritance -> immutable ScenarioConfig
  topology.py       # CSV load + per-scenario immutable Topology
  routing.py        # Dijkstra routing, reachability, dispatch cost
  metrics.py        # per-replication accumulator -> ReplicationMetrics
  model.py          # the SimPy simulation (one process per truck)
  runner.py         # one-replication entry point
  scenario_runner.py# multi-rep / multi-scenario orchestration
  aggregate.py      # Student-t CIs + bottleneck ranking
  narrative.py      # assumptions / limitations / scenario text
  io_writers.py     # results.csv / event_log.csv / summary.json (flat schema)
  viz.py            # topology.png + animation.gif from model data
  cli.py            # argparse CLI: run / run-all / list / render
tests/              # 78 focused unit + integration tests
data/               # input CSVs + scenario YAMLs (incl. the 7th combo)

Outputs at the root: results.csv, summary.json, event_log.csv, topology.png, animation.gif, plus this README.md and conceptual_model.md.

← Back to leaderboard