2026-06-04__002_container_shipping_throughput__claude-code__claude-opus-4-8__max-effort

Date: 2026-06-04 · Benchmark: 002_container_shipping_throughput · Harness: claude-code · Model: claude-opus-4-8 (max-effort) · ✓ Autonomous

Scores

Category Points Max
Conceptual modelling 20 20
Data and topology 15 15
Simulation correctness 19 20
Experimental design 14 15
Results & interpretation 15 15
Code quality 8 10
Traceability 5 5
Total 96 100

Run metrics

Evaluation report

Scenario Mean throughput
baseline 315,333.333
canal_closed 270,333.333
canal_upgrade 317,000
fleet_large 403,333.333
fleet_small 228,666.667
port_slowdown 229,000
rotterdam_upgrade 360,000

Source files

Downloads

Conceptual model

Conceptual Model — Asia–Europe Container Shipping Throughput

This document defines the system being modelled, the entities and resources, the events and state, and — kept deliberately separate — which assumptions are derived from the supplied data and which are introduced modelling choices. It closes with the model’s limitations.

The implementation is a genuine SimPy discrete-event simulation (container_sim/simulation.py); nothing here is a spreadsheet average.


1. System boundary

In scope. The liner service that lifts laden containers from two Asian origin ports (Shanghai CNSHA, Singapore SGSIN), sails them along a directed maritime network to the primary European import port (Rotterdam NLRTM), discharges them, and returns the vessels empty (ballast) to reload. The horizon is a 180-day planning window.

Objective (the thing we measure). Total TEU discharged at Rotterdam over the horizon, and the rate at which it is delivered.

Out of scope / boundary conditions.


2. Entities

EntityCountDescription
Vesselscenario fleet.vessel_count (8 / 12 / 20)A neo-panamax of 10 000 TEU, service speed 19 kn, with a home port. Moves through a repeating round-trip process.
Voyage (round trip)emergentload → sail laden → wait+discharge → sail ballast → (maintenance). One delivery of 10 000 TEU per voyage.

The fleet for a scenario is the first vessel_count rows of vessels.csv (file order alternates SGSIN/CNSHA home ports, so any prefix is a balanced split: 4/4, 6/6, 10/10).


3. Resources (the contended capacity)

ResourceSimPy objectCapacitySource
Rotterdam discharge berthResourceberth_count = 1berths.csv B_RTM
Shanghai / Singapore load berthsResourceberth_count = 3 eachberths.csv B_SHA, B_SIN
Suez Canal (NB and SB)Resource per directed legcapacity = 3 (12 in canal_upgrade)sea_legs.csv L06_*
Open-water / strait / coastal legsnone (pure delay)capacity = 999 ⇒ unconstrainedsea_legs.csv

A leg is treated as a contended chokepoint only if its capacity is below a threshold (100); in this network only the canal qualifies. Everything coded 999 is open water and modelled as a transit delay with no queue.

Handling rate. A port advertises berth_count and crane_count. Cranes are modelled as evenly pre-assigned to berths, so one berthed vessel is worked at (crane_count / berth_count) × moves_per_hour_per_crane TEU/h and up to berth_count vessels are served in parallel. The terminal’s aggregate discharge capacity is therefore exactly crane_count × moves_per_hour_per_crane, and cranes are never double-counted. At Rotterdam this is 4 × 28 = 112 TEU/h on the single berth (≈ 89 h to discharge 10 000 TEU).


4. Events

Per voyage, the model emits (and logs) these events:

LOAD_STARTDEPART_ORIGIN → [CANAL_ENTERCANAL_EXIT]* → ARRIVE_DESTDISCHARGE_STARTDELIVER → [canal on return]* → ARRIVE_HOME.

DELIVER carries the TEU discharged; summing DELIVER.teu over the event log reconstructs delivered throughput exactly (verified for every scenario).


5. State


6. Routing and dispatching logic


7. Stochasticity, seeds, and the transient


8. Assumptions — data-derived vs introduced

8a. Derived directly from the supplied data

8b. Introduced modelling choices (with justification)

  1. Vessels always sail full (10 000 TEU loaded, 10 000 discharged). The brief’s objective is to maximise TEU to Rotterdam with unlimited export demand, so the constraint is capacity, not cargo.
  2. One move ≈ one TEU (stated in the B_RTM metadata) — discharge time = TEU / (cranes × moves/h).
  3. Even crane-to-berth split (Section 3). Makes Rotterdam exact (112 TEU/h); at origins it splits 5 cranes over 3 berths. Aggregate port capacity is exact.
  4. Capacity-constrained leg ⇔ capacity < 100 ⇒ only the canal is a queueing resource; open water is a pure delay.
  5. Discharge-only destination; the return leg is ballast (factors are 1.0, so ballast speed equals laden speed here).
  6. Availability → inter-voyage maintenance downtime calibrated so long-run availability equals the stated value: downtime = cycle_time × (1−A)/A (zero for A = 1).
  7. Cold start at home ports at t = 0 (Section 7).
  8. Draft is ignored — no leg or port in the data carries a depth limit, so the max_draft_m column is non-binding (a deliberate distractor; documented, not used).
  9. The two canal directions are modelled as two independent directed resources (mirroring the two L06 rows) rather than one shared channel; at ~3 % utilisation the distinction is immaterial.

9. Limitations

README

Asia–Europe Container Shipping Throughput — SimPy Model

A discrete-event simulation (SimPy) of a liner service moving containers from Shanghai and Singapore to Rotterdam, used to answer the operator’s seven decision questions. The model derives routes and travel times from the directed network, treats the Suez Canal and the discharge berths as contended resources, runs 30 seeded replications per scenario, and reports throughput with 95 % confidence intervals.

Headline: the binding constraint is Rotterdam’s single discharge berth, not the canal and not the fleet. Expanding the canal is a near-no-op (+0.5 %); adding ships saturates the berth (diminishing returns, 20-day anchorage waits); adding one more crane-equipped berth at Rotterdam raises throughput +14 % with the same fleet and is the precondition for any fleet growth.


1. Install and run

Requires Python 3.10+ and the standard scientific stack.

pip install -r requirements.txt          # simpy, numpy, pandas, scipy, pyyaml, networkx, matplotlib

# from this submission directory:
python -m container_sim run               # run all scenarios -> results.csv, summary.json, event_log.csv
python -m container_sim verify            # reproducibility + event-log reconstruction self-tests
python -m container_sim figures           # render the 5 figures in figures/

# optional subset:
python -m container_sim run --scenarios baseline,canal_closed

The full run is deterministic and takes ≈ 1 second on a laptop (210 replications).


2. Routing and dispatching logic


3. Key results

30 replications/scenario, 180-day horizon. total_teu = TEU discharged at Rotterdam over the horizon; vs base = ratio to baseline.

ScenarioFleetRoutetotal_teu (95 % CI)TEU/day (2nd-half)Cycle (d)Anchorage wait (h)RTM berth utilTEU / vesselvs base
fleet_small8suez228 667 (226 319–231 015)1 45261.6840.4828 5830.73
baseline12suez315 333 (312 435–318 231)1 97066.01570.6526 2781.00
rotterdam_upgrade12suez360 000 (360 000–360 000)2 34859.7300.3730 0001.14
canal_upgrade12suez317 000 (314 378–319 622)1 97865.51570.6626 4171.01
canal_closed12cape270 333 (269 139–271 528)1 67077.91630.5622 5280.86
port_slowdown12suez229 000 (226 955–231 045)1 54187.25680.8319 0830.73
fleet_large20suez403 333 (400 503–406 164)2 67883.24820.8420 1671.28

rotterdam_upgrade is my own added scenario (Section 5, Q7).

Resource utilisation at baseline (the bottleneck fingerprint):

Rotterdam discharge berthShanghai berthsSingapore berthsSuez Canal (NB)Suez Canal (SB)
65 %26 %29 %3.3 %2.9 %

Figures in figures/: topology.png, throughput_transient.png, fleet_saturation.png, bottleneck_utilisation.png, scenario_comparison.png.


4. Answers to the operator’s decision questions

Q1 — Baseline throughput & uncertainty. ≈ 315 000 TEU over 180 days (95 % CI 312 435–318 231, n = 30), i.e. about 31.5 full vessel-voyages. Sustained steady-state rate ≈ 1 970 TEU/day (second-half; ≈ 1 900 TEU/day counting from the 14-day warmup, which is depressed by the empty-pipeline ramp — first deliveries land ~day 28; see throughput_transient.png). The CI is tight because delivered TEU is quantised in whole 10 000-TEU loads and the completed-voyage count is robust to the specified transit/handling noise — the variability shows up in cycle time and anchorage wait, not in the count.

Q2 — Where is the binding constraint? Rotterdam’s discharge operation — one berth worked by 4 cranes (112 TEU/h, ≈ 89 h per ship). The evidence is from the model, not the labels:

Q3 — Does adding ships help, or does it saturate? It saturates. As the fleet grows 8 → 12 → 20:

Q4 — Would expanding the canal help? No — essentially no effect (+0.5 %, within noise). The canal is only ~3 % utilised at baseline, so it is not the binding resource. Adding slots (3 → 12) and speeding transit drops canal utilisation further (3.3 % → 0.5 %) and shaves a few hours off a ~25-day transit, but the Rotterdam berth — untouched — still gates the system. Canal spend buys almost nothing here.

Q5 — Sensitivity to destination discharge productivity. Very high. Halving Rotterdam’s crane rate (28 → 16 moves/h, port_slowdown) cuts throughput 27 % (315 k → 229 k), drives berth utilisation to 0.83 and the anchorage wait to 568 h. Throughput moves almost one-for-one with discharge capacity — the clearest confirmation that the discharge operation is the lever.

Q6 — Operational impact of losing the canal. ≈ −14 % (315 k → 270 k, 0.86×). The service reroutes automatically via the Cape of Good Hope (+3 200 nm each way), lengthening the cycle 66 → 78 days, cutting voyages and per-vessel delivery (26 278 → 22 528 TEU). The model does not fail, because a valid alternative exists; it only fails clearly if the Cape route is also unavailable.

Q7 — Single recommended intervention. Add a second crane-equipped deep-sea discharge berth at Rotterdam (my rotterdam_upgrade scenario: berths 1 → 2, cranes 4 → 8, fleet unchanged at 12). Evidence:


5. Output file schema

All machine-readable files use clear, conventional names — no bespoke key needed.

results.csv — one row per (scenario, replication)

columnmeaning
scenario_idscenario name
replicationreplication index (0…29)
base_seedbase random seed for the scenario
fleet_sizenumber of vessels
route_typedominant route used: suez or cape
total_teuTEU discharged at Rotterdam over the horizon (primary metric)
deliveriesnumber of completed discharge voyages
teu_per_dayTEU/day delivered after the warmup (> warmup_days)
teu_per_day_second_halfTEU/day over the fully-ramped second half (steady state)
teu_per_vesseltotal_teu / fleet_size
mean_cycle_time_daysmean round-trip duration
mean_anchorage_wait_hmean wait at Rotterdam for the discharge berth
mean_origin_wait_hmean wait at the origin for a load berth
rtm_berth_utilRotterdam discharge-berth utilisation (busy ÷ capacity·horizon)
util_port_CNSHA, util_port_SGSINorigin berth utilisations
util_port_DEHAMHamburg berth utilisation — always 0 (the distractor sink is never served; useful confirmation)
util_L06_CANAL_NB, util_L06_CANAL_SBSuez Canal slot utilisations (0 when closed)

summary.json — scenario-level summary + cross-scenario analysis

{
  "generated_with": "container_sim 1.0.0",
  "scenarios": {
    "<id>": {
      "scenario_id", "description", "horizon_days", "warmup_days",
      "replications", "base_random_seed", "fleet_size", "route_type",
      "deliver_to", "status",                       // "ok" or "failed" (+ "reason")
      "metrics": { "<metric>": {mean, std, sem, ci95_low, ci95_high, n}, ... }
    }, ...
  },
  "analysis": {
    "throughput_ratio_vs_baseline": { "<id>": ratio, ... },
    "fleet_marginal_teu_per_vessel": [ {from, to, delta_vessels, delta_teu, marginal_teu_per_added_vessel}, ... ],
    "teu_per_vessel_by_fleet": {...}, "anchorage_wait_h_by_fleet": {...},
    "baseline_utilisation": {...}
  }
}

Metrics carrying a 95 % CI: total_teu, teu_per_day, teu_per_day_second_half, teu_per_vessel, deliveries, mean_cycle_time_days, mean_anchorage_wait_h, mean_origin_wait_h, rtm_berth_util, util_L06_CANAL_NB, util_L06_CANAL_SB. CIs are Student-t with n−1 degrees of freedom.

event_log.csv — auditable vessel-movement trace

One row per important event, sufficient to audit movements and reconstruct delivered throughput: sum of teu over rows where event_type == "DELIVER", grouped by (scenario_id, replication), equals that replication’s total_teu (verified exactly for all scenarios).

columnmeaning
scenario_id, replicationkeys
sim_time_h, sim_dayevent time (hours; days)
vessel_id, vessel_classthe vessel
event_typeLOAD_START, DEPART_ORIGIN, CANAL_ENTER, CANAL_EXIT, ARRIVE_DEST, DISCHARGE_START, DELIVER, ARRIVE_HOME
locationnode id or leg id
route_typesuez or cape
teuTEU on the move (10 000 on DELIVER)
wait_hqueue wait recorded on the event (anchorage / origin)

6. Reproducibility, validation, limitations

← Back to leaderboard