2026-04-29__001_synthetic_mine_throughputclaude-codeclaude-opus-4-7__plan-mode

Date: 2026-04-29 · Benchmark: 001_synthetic_mine_throughput · Harness: claude-code · Model: claude-opus-4-7 (plan-mode) · ✓ Autonomous

Scores

Category	Points	Max
Conceptual modelling	19	20
Data and topology	15	15
Simulation correctness	19	20
Experimental design	14	15
Results & interpretation	14	15
Code quality	10	10
Traceability	5	5
Total	96	100

Run metrics

Total tokens: 114300 (method: reported)
Input / output tokens: — / —
Runtime: 693 s
Reviewer model: unknown · harness: claude-code · on 2026-04-29
Recommendation: Strong submission
Notes: Clean SimPy DES with rigorous data-derived/introduced assumption split and full event-log auditability; only minor dead code and warmup-discard nit.

Evaluation report

Automated checks: 53 / 53 (100%)
Behavioural checks: — / —
Download full evaluation_report.json

Scenario	Mean throughput
baseline	13,143.333
trucks_4	7,983.333
trucks_12	13,783.333
ramp_upgrade	13,173.333
crusher_slowdown	7,236.667
ramp_closed	13,110

Source files

README.mdmarkdown · 11.2 KB
conceptual_model.mdmarkdown · 9.6 KB
data/dump_points.csvcsv · 134 B
data/edges.csvcsv · 2.5 KB
data/loaders.csvcsv · 160 B
data/nodes.csvcsv · 1.2 KB
data/scenarios/baseline.yamlyaml · 632 B
data/scenarios/crusher_slowdown.yamlyaml · 268 B
data/scenarios/ramp_closed.yamlyaml · 200 B
data/scenarios/ramp_upgrade.yamlyaml · 207 B
data/scenarios/trucks_12.yamlyaml · 112 B
data/scenarios/trucks_4.yamlyaml · 109 B
data/trucks.csvcsv · 424 B
plot_topology.pypython · 3.5 KB
prompt.mdmarkdown · 10.5 KB
requirements.txttext · 89 B
results/evaluation_report.jsonjson · 11.7 KB
results.csvcsv · 33.8 KB
run.pypython · 5.7 KB
run_metrics.jsonjson · 239 B
src/__init__.pypython · 48 B
src/analysis.pypython · 9.1 KB
src/experiment.pypython · 5.7 KB
src/model.pypython · 7.2 KB
src/simulation.pypython · 17.8 KB
submission.yamlyaml · 532 B
summary.jsonjson · 11.0 KB
token_usage.jsonjson · 144 B

Downloads

event_log.csv29.8 MB
topology.png156.9 KB

Conceptual model

Conceptual model — synthetic mine throughput

This document describes the discrete-event simulation built for benchmark 001_synthetic_mine_throughput. The implementation is in src/ (model, simulation, experiment, analysis) and is driven by run.py. The model follows the data exactly where the data is unambiguous, and documents every assumption it has had to introduce.

1. System boundary

Included:

One 8-hour ore haulage shift (480 minutes simulated time per replication).
Movement of ore from LOAD_N and LOAD_S to the primary crusher CRUSH.
Truck cycles: empty travel from current node to a chosen loader, loading, loaded travel to the crusher, dumping, then dispatch to the next loader.
Capacity-constrained roads (E03 ramp segments, E05 crusher approach, E07 north-pit access, E09 south-pit access) modelled as resources.
Loaders (L_N, L_S) and the crusher dump point (D_CRUSH) as resources.
Stochastic loading time, dumping time, and per-edge travel time.

Excluded:

Waste and maintenance haulage. Baseline production.dump_destination is CRUSH — only ore movement is counted. WASTE and MAINT are present in the topology but no truck routes through them in any required scenario.
Truck breakdowns, fuelling, operator breaks, shift handovers.
Time-of-day effects (weather, lighting), grade-aware truck dynamics, and driver behaviour.
Multi-shift continuity. Each replication starts with an empty system and trucks at PARK.

2. Entities

Trucks (T01 … T0N) are the only active entities. Each truck is a SimPy process carrying:

current node,
loaded / empty status,
payload (100 t when loaded, 0 t empty),
per-truck statistics (cycles completed, busy time, queueing time, cycle-start timestamps).

Ore payloads are carried by trucks rather than modelled as independent entities — the simulation only needs to track tonnes delivered, which is incremented at every dump_end event.

3. Resources

Resource	SimPy capacity	Notes
`L_N` loader	1	North pit, mean load 6.5 min (slower face)
`L_S` loader	1	South pit, mean load 4.5 min (faster face)
`D_CRUSH` crusher	1	Mean dump 3.5 min (7.0 min in `crusher_slowdown`)
`E03_UP`	1	Narrow uphill ramp, designated bottleneck
`E03_DOWN`	1	Narrow downhill ramp
`E05_TO_CRUSH`	1	Crusher approach (inbound)
`E05_FROM_CRUSH`	1	Crusher approach (outbound)
`E07_TO_LOAD_N`	1	Single-lane north-pit face access (in)
`E07_FROM_LOAD_N`	1	Single-lane north-pit face access (out)
`E09_TO_LOAD_S`	1	Single-lane south-pit face access (in)
`E09_FROM_LOAD_S`	1	Single-lane south-pit face access (out)

All other roads have capacity 999 (declared in edges.csv). They are not modelled as SimPy resources — that would add bookkeeping overhead with no queueing realism. They are simple env.timeout(travel_min) segments.

4. Events

Logged event types (written to event_log.csv):

truck_dispatched — process started, truck released from PARK.
edge_queue_join — truck arrives at a constrained edge resource.
edge_entered — truck obtains the edge resource and begins travelling.
edge_exited — truck has finished traversing the edge.
edge_traversed_unconstrained — emitted once per traversal of an unconstrained edge; carries from, to, resource_id (the edge ID).
arrived_at_loader — truck reached its dispatched loader node.
load_start / load_end — bracket the loading service.
arrived_at_crusher — truck reached the dump node loaded.
dump_start / dump_end — bracket the dump service. Tonnes are credited at dump_end only.
shift_end_truncated — truck process exits because the shift ended.
routing_error — emitted only if the topology cannot route the truck (intentional fail-loud).

5. State variables

Per-truck: location, loaded flag, payload, cycles completed, busy time, queue time, cycle-start timestamps.
Per-resource: cumulative busy time, cumulative queue-wait time, queue-event count, queue-length samples (timestamped).
System-wide: total tonnes delivered, total dump events, simulation clock.

6. Assumptions

6.1 Data-derived assumptions

The directed graph is authoritative. Each edges.csv row is a single directed segment with its own capacity. E03_UP and E03_DOWN are modelled as independent capacity-1 resources, exactly as the data designer encoded them. The dataset metadata describes the up/down split as a “simplification” of the real shared physical channel.
dump_points.csv is authoritative for crusher service time. The crusher_slowdown scenario also overrides nodes.CRUSH.service_time_*, which we apply for consistency, but the simulation reads from dump_points.
production.dump_destination = CRUSH in baseline.yaml is honoured — the simulation routes all ore to CRUSH and ignores WASTE.
Loader and crusher capacities are 1 (one truck served at a time), consistent with loaders.csv and dump_points.csv.
Truck count and start node come from fleet.truck_count and trucks.csv (start_node = PARK). The first N trucks of trucks.csv are used.

6.2 Introduced assumptions

Capacity threshold for SimPy resources = 100. Any edge with declared capacity ≥ 100 is treated as effectively unlimited and is not modelled as a SimPy resource. This avoids spurious overhead on the capacity-999 edges while still honouring road_capacity_enabled = true for the genuinely narrow segments.
Truncated normal load and dump times. Sampled from N(mean, sd) and rejected (resampled) below 0.1 * mean; the floor prevents non-physical near-zero samples. With sd << mean rejection is rare (≤ 1 % of samples).
Lognormal travel-noise multiplier with unit mean. Each edge traversal multiplies travel_min by lognormal(μ, σ) with σ² = ln(1 + CV²) and μ = -σ²/2, so E[multiplier] = 1. CV is read from stochasticity.travel_time_noise_cv (0.10 in baseline). This avoids the bias of a naive lognormal(0, σ) whose mean is exp(σ²/2) > 1.
Random initial dispatch stagger uniformly over [0, 60] s per truck. Without this, all 8–12 trucks at PARK would request the same edge at t = 0 and SimPy’s deterministic insertion-order tie-break would create artefacts. The stagger is small relative to a multi-minute cycle and does not violate warmup_minutes = 0, which refers to statistical warm-up rather than initial conditions. Documented in key_assumptions.
Routing = Dijkstra shortest travel time. Edge weights are distance_m / (max_speed_kph * 1000 / 60). Routes are recomputed from scratch at every dispatch (so closures and per-scenario speed overrides apply). Trucks already en route do not re-plan.
Dispatching = nearest_available_loader. Score = travel_time_to_loader + queue_size * mean_load_time_loader. Tie-breaker: shorter expected return cycle (loader → crusher).
End-of-shift policy. No new loader requests start after shift_end_min = 480. Trucks already loaded complete their travel and dump (otherwise tonnes physically held in trucks would be discarded). Trucks empty when the shift ends finish their current edge then exit. Tonnes are counted only at dump_end.
Utilisation accounting clipped at the shift boundary. Resource and truck busy-time accumulators record only the portion of each operation that falls within [0, 480] min, so utilisation values lie in [0, 1] even when tail dumping continues past 480 min.

6.3 Limitations

E03_UP / E03_DOWN as two independent capacity-1 resources slightly understates contention versus a single shared bidirectional channel. A more conservative variant could be modelled as a single resource with direction switching, but the data does not provide switching parameters.
Uniform speed factors per edge — gradient and curvature are not modelled separately beyond what is encoded in max_speed_kph.
Travel-time noise is i.i.d. per edge traversal, with no temporal autocorrelation (no weather front, no shift fatigue effects).
Loaders and the crusher are 100 % available within the shift (no breakdowns, no operator switches).
Trucks already en route do not reroute when a downstream queue grows; routing decisions are made only at dispatch time.

7. Performance measures

Reported per scenario (mean across 30 replications, ± 95 % CI by Student’s t-distribution with df = 29):

total_tonnes_delivered — sum of payloads at completed dump_end.
tonnes_per_hour — total_tonnes_delivered / 8.
average_truck_cycle_time_min — mean inter-cycle-start gap, averaged across all trucks and all cycles.
average_truck_utilisation — fraction of shift each truck spent travelling, loading or dumping (excludes queueing).
crusher_utilisation — fraction of shift the crusher was serving a truck, clipped at the shift boundary.
loader_utilisation — same, per loader (L_N, L_S).
average_loader_queue_time_min — mean queue-wait at loaders.
average_crusher_queue_time_min — mean queue-wait at the crusher.
top_bottlenecks — resources ranked by mean queue-wait time, used to identify the binding constraint in each scenario.

README

Synthetic Mine Throughput Simulation (Benchmark 001)

A discrete-event simulation in Python + SimPy of an 8-hour ore haulage shift on a small synthetic open-pit mine. Six scenarios are run with 30 replications each to answer six operational decision questions about ore throughput, bottlenecks, and infrastructure investment.

The simulation, conceptual model, and analysis are designed to be reproducible from a clean checkout: the only inputs are the CSVs and YAMLs in data/, and the only outputs are the four required artefacts plus an optional topology figure.

1. Install

pip install -r requirements.txt

Tested with Python 3.13 and SimPy 4.x.

2. Run

python run.py                              # all six scenarios, 30 reps each
python run.py --scenario baseline          # single scenario
python run.py --replications 2             # smoke test
python plot_topology.py                    # regenerate topology.png

Total runtime for the full sweep is ~2 s on a modern laptop.

Outputs written to the submission root:

results.csv — per-replication metrics (180 rows for the full sweep).
summary.json — per-scenario means, 95 % CIs, top bottlenecks, assumptions and limitations.
event_log.csv — every event for every truck in every replication (~430 k rows).
topology.png — static diagram of nodes / edges with constrained edges highlighted in red.

3. Reproducibility

Each replication’s RNG is seeded from SHA-256(base_random_seed :: scenario_id :: replication_index) truncated to 64 bits. The base seed is read from baseline.yaml (simulation.base_random_seed = 12345) and inherited by every scenario. Per-rep seeds appear in results.csv under the random_seed column.

Re-running python run.py produces byte-identical results.csv and summary.json (timestamp-free outputs) on the same Python / SimPy / NumPy versions.

4. Conceptual model

See conceptual_model.md for the full conceptual model. Brief summary:

Entities: trucks (one SimPy process each).
Resources: loaders (L_N, L_S), the crusher dump (D_CRUSH), and seven capacity-constrained edges (E03 ramp ×2, E05 crusher approach ×2, E07 north-pit ×2, E09 south-pit ×2). Edges with capacity ≥ 100 are treated as effectively unlimited and modelled as plain env.timeout.
Stochasticity: truncated-normal load and dump times, plus a per-edge lognormal travel-time multiplier with unit mean (CV = 0.10).
Random initial stagger: each truck is dispatched at a uniform [0, 60] s offset to avoid pathological insertion-order resource ordering at t = 0.
End-of-shift policy: no new loader requests after 480 min; in-flight loaded trucks complete their dump (tonnes counted only at dump_end).

5. Routing and dispatching

Routing uses networkx.dijkstra_path over the directed graph with edge weight distance_m / (max_speed_kph × 1000 / 60) (minutes). Closed edges (closed = true in edges.csv after scenario overrides) are removed from the graph. Routes are recomputed from the current node at the start of every empty leg, so closures and per-scenario speed overrides take effect immediately. If a required route does not exist, the model raises RoutingError rather than silently producing a misleading result.

Dispatching follows the baseline nearest_available_loader policy with shortest_expected_cycle_time tie-breaker:

score(loader) = travel_time(current_node → loader_node)
              + queue_size(loader) × mean_load_time(loader)

The dispatcher picks the loader with the lowest score. Ties are broken by shorter expected return travel from loader to crusher.

This rule is queue-aware — it accounts for the busy time and queue at each loader, not just travel distance — and naturally balances load across both pits.

6. Key results

All values are means over 30 replications; bracketed values are 95 % confidence intervals using Student’s t-distribution (df = 29).

Scenario	Trucks	Tonnes (mean)	Tonnes / h	Cycle (min)	Crusher util	L_N util	L_S util	Truck util	Loader queue (min)	Crusher queue (min)
`baseline`	8	13 143 [13 089 – 13 198]	1 643	30.1	0.90	0.72	0.70	0.79	2.81	3.45
`trucks_4`	4	7 983 [7 945 – 8 021]	998	24.6	0.56	0.36	0.48	0.94	0.92	0.65
`trucks_12`	12	13 783 [13 683 – 13 883]	1 723	43.7	0.93	0.76	0.73	0.57	4.19	14.92
`ramp_upgrade`	8	13 173 [13 125 – 13 221]	1 647	30.1	0.91	0.73	0.71	0.80	2.83	3.30
`crusher_slowdown`	8	7 237 [7 154 – 7 320]	905	56.1	0.94	0.40	0.39	0.50	1.80	26.61
`ramp_closed`	8	13 110 [13 043 – 13 177]	1 639	30.2	0.90	0.72	0.71	0.79	2.80	3.41

Numbers come straight out of summary.json and can be reproduced with python run.py.

7. Answers to the operational decision questions

Q1. Expected ore throughput in the baseline 8-hour shift

~13 100 tonnes per shift, or about 1 640 t/h. The 95 % CI is narrow ([13 089 – 13 198]) because the crusher near-saturates and damps stochastic variation in upstream times.

Q2. Likely bottlenecks

The crusher is the single dominant constraint. Its utilisation sits at 0.90 in the baseline and rises to 0.93–0.94 as soon as the fleet is enlarged or the crusher itself is slowed. Mean crusher queueing time jumps from 3.4 min (baseline) to 14.9 min (12 trucks) to 26.6 min (slow crusher).
The slow north-pit loader (L_N, 6.5 min mean load) is a secondary bottleneck. It accumulates a longer mean queue (4.5 min) than the crusher in the baseline because each truck holds it for longer.
The narrow ramp E03_UP / E03_DOWN is not binding in any of the six scenarios. Empty trucks already prefer the western bypass, and the loaded route via the upper haul road does not require the ramp.

summary.json → scenarios.<id>.top_bottlenecks ranks all resources by mean queue wait per scenario, drawn from the per-replication queue statistics.

Q3. Does adding more trucks materially improve throughput?

No — the system saturates near 8 trucks. Adding 4 trucks (4 → 8) adds 5 160 t (+65 %). Adding the next 4 trucks (8 → 12) adds only 640 t (+5 %). Truck utilisation collapses from 0.94 → 0.79 → 0.57 across the 4/8/12 cases, and the crusher’s queue grows from 0.7 to 14.9 min — the extra trucks simply queue at the crusher.

Q4. Would improving the narrow ramp help?

No, not under these scenarios. ramp_upgrade raises ramp speed and removes the capacity-1 constraint, but throughput is essentially unchanged (13 173 vs 13 143 t — within the 95 % CI overlap). The crusher is binding, so freeing the ramp does not unlock more throughput.

The ramp would only matter if (i) the fleet were small enough that travel time dominates, or (ii) the ramp’s capacity-1 constraint actually queued. Neither is the case in the six required scenarios.

Q5. Sensitivity to crusher service time

Very high. Doubling mean dump time from 3.5 → 7.0 min cuts throughput roughly in half (13 143 → 7 237 t, –45 %). This is the largest single- parameter sensitivity in the study and confirms the crusher is the binding resource. Crusher mean queue wait jumps from 3.4 to 26.6 min; loader utilisation falls from ~0.71 to ~0.39 because trucks back up behind the crusher rather than cycling.

Q6. Operational impact of losing the main ramp route

Negligible. With E03_UP and E03_DOWN closed, throughput drops by 0.25 % (13 143 → 13 110 t — within CI overlap). The bypass via J2 → J7 → J5 / J8 is already the shortest empty route from PARK and the loaded route does not depend on the ramp at all (LOAD_N → J5 → J3 → J4 → CRUSH uses upper haul roads, not the ramp). The ramp adds resilience in worse scenarios but, on these data, the bypass is a very close substitute.

8. Behavioural self-checks

The script prints (and the harness re-runs) six broad sanity checks. All six pass on the latest run:

[PASS] trucks_12_gt_trucks_4
[PASS] baseline_gt_trucks_4
[PASS] ramp_upgrade_ge_baseline
[PASS] crusher_slowdown_lt_baseline
[PASS] ramp_closed_le_baseline
[PASS] truck_count_saturation_plausible

9. Limitations

E03_UP / E03_DOWN as two independent capacity-1 resources slightly understates contention versus a single shared bidirectional channel.
No truck breakdowns, refuelling, operator breaks, or shift handovers.
Speed factors are uniform per edge — gradient and curvature are abstracted into max_speed_kph.
Travel-time noise is i.i.d. per traversal, so correlated weather or time-of-day effects are not modelled.
Routes are committed at dispatch time; trucks in flight do not reroute in response to growing queues.
Loaders and the crusher are 100 % available within the shift.

10. Suggested further scenarios

Crusher ×2. Add a second dump capacity to the crusher (or a second crusher) to confirm the throughput ceiling moves and to dimension the loaders for the next bottleneck.
Faster loader at LOAD_N. Shorten the north-pit mean load time to match LOAD_S to test whether evening loader speed reduces queueing.
Ramp closure + crusher slowdown. Confirm whether the bypass remains adequate when both ramp and crusher are stressed.
Stochastic loader breakdowns. Introduce reliability with mean time-to-failure and repair, to size the second loader as a hedge.

These are listed for the operator’s consideration and were not implemented in this submission.

11. Repository layout

.
├── conceptual_model.md
├── README.md
├── requirements.txt
├── run.py
├── plot_topology.py
├── src/
│   ├── __init__.py
│   ├── model.py        # data + scenario inheritance + graph + RNG helpers
│   ├── simulation.py   # SimPy resources, truck process, dispatcher, event log
│   ├── experiment.py   # replication driver + scenario sweep
│   └── analysis.py     # CIs, bottleneck identification, output writers
├── data/               # provided inputs (read-only)
│   ├── nodes.csv
│   ├── edges.csv
│   ├── trucks.csv
│   ├── loaders.csv
│   ├── dump_points.csv
│   └── scenarios/
│       ├── baseline.yaml
│       ├── trucks_4.yaml
│       ├── trucks_12.yaml
│       ├── ramp_upgrade.yaml
│       ├── crusher_slowdown.yaml
│       └── ramp_closed.yaml
├── results.csv         # generated
├── summary.json        # generated
├── event_log.csv       # generated
└── topology.png        # generated by plot_topology.py

← Back to leaderboard