67,667 runners across 3 April editions. Groups identified by Gaussian Mixture Models — the data decides the number and boundaries, not the analyst.
Two groups where the data supports it. One group where it doesn't.
Ideal
The engineering benchmark
Not a group of runners. The grade-adjusted optimal pace per segment from the official Boston elevation profile. Seg.1 ideal is already faster than goal pace (downhill). Seg.2 ideal is exactly goal pace (flat).
Controlled (GMM lower component)
The well-paced group
Where GMM finds k=2, this is the lower positive split cluster (typically 62–79%). Defining trait: they run Seg.1 slower than the grade-adjusted ideal — deliberately leaving speed on the table.
Blowup (GMM upper component)
The over-paced group
The upper GMM component. Present at 2:35–3:30 and 3:45 where the distribution is genuinely bimodal. Above 3:30 most buckets show k=1: everyone fades, just by degree.
The course profile is fixed. Whether you respect it is not.
The grade-adjusted ideal accounts for the descent and prescribes a pace already faster than goal. The best-paced group (GMM controlled) still runs it slower than even that adjusted ideal. The blowup group runs 6–10 sec/km faster than the ideal — taking more than the hill gives.
Flat terrain. No elevation excuse. Yet the median runner at 2:40 runs this segment 3.4 sec/km faster than the ideal — a larger gap than Seg.1. The blowup group is 7–9 sec/km faster than ideal through Wellesley. The deviation from benchmark is bigger here than on the downhill.
The most physically constrained segment. 91% of runners run it slower than their own average. The controlled group holds near ideal. The blowup group exceeds ideal by 6–12 sec/km. At slower paces (3:30+), Seg.3 becomes the dominant source of the positive split.
Net downhill. The grade-adjusted ideal prescribes a pace faster than average. The controlled group runs near ideal — Boylston is the reward. The blowup group runs 13–20 sec/km slower than ideal. The most variable segment (sd=9.7 sec/km): the biggest spread of any part of the course.
Groups are GMM-identified. k=2 where bimodal, k=1 where unimodal.
Cumulative splits
Pace per segment — ideal vs groups
Effort distribution
Negative = faster than ideal. Positive = slower.
Lower = faster.
Lower = faster.
Median positive split per GMM component.
Negative split = 2nd half faster. Rare at Boston.
Negative = faster than ideal. Green = controlled/overall, red = blowup (where k=2).
Where the Seg.1 decision is paid back.
Findings that hold regardless of how groups are defined.
The grade-adjusted ideal already prescribes a faster-than-goal pace for Seg.1 (downhill) and exactly goal pace for Seg.2 (flat). The median runner still runs Seg.2 at −3.4 sec/km vs ideal — a larger gap than Seg.1 (−1.3 sec/km). On flat terrain with no elevation excuse, the deviation from benchmark is larger. The blowup group runs both segments 7–9 sec/km faster than ideal. The controlled group is the only one that runs Seg.1 slower than the grade-adjusted ideal.
GMM (BIC-optimal) identifies two natural components in the positive split distribution from 2:35 through 3:30. Above 3:30, most buckets show k=1: a single unimodal distribution. No meaningful subgroup to separate.
Where GMM finds k=2, the blowup component is 21–38% of runners. The controlled group is 62–79%. These are data-driven proportions that vary by bucket — not a tautological output of a percentile split. Both groups end up at nearly the same total time vs ideal (−8 to −11s net). The difference is in how that time is distributed across segments.
I'm Fabio, an Italian engineer in Zurich, father and marathon runner. Engineering approach to running. Let's go!
1 — Data
Official BAA chip timing results for April 2022, 2023, and 2025 (79,866 total finishers). 2021 excluded: October edition, 62°F, high humidity — median positive split 8.5 min vs 4.4–6.5 min in April editions. 2024 excluded: 69°F + headwind — positive splits nearly double (e.g. men 2:50: +2:49 in 2023 vs +6:23 in 2024).
2 — Outlier removal
Runners with any segment deviation >15% from own average removed (chip errors, medical stops). 14.3% overall. Median values throughout — robust to right-skewed positive split distributions.
3 — Ideal pacing (grade-adjusted)
Runners Connect formula: +13 sec/mi per 1% uphill (above +0.4%), −8 sec/mi downhill (below −1.5%). Base pace B = (T_goal − Σ(A_m × d_m)) / 26.2. Elevation: official Boston mile markers (Daniels/Burfoot). Verified against RC calculator.
4 — Group classification (GMM)
Gaussian Mixture Models fit to the positive split distribution per bucket. Number of components k selected by BIC (Bayesian Information Criterion) — the model complexity that best explains the data. k=2 found at 2:35–3:30 and 3:45; k=1 elsewhere.
5 — Regression finding
Linear regression of Seg.4 delta vs Seg.1 delta (sec/km vs grade-adjusted ideal). Slope ≈ −1.3 at fast paces: 1 sec/km too fast in Seg.1 costs 1.3 sec/km in Seg.4. R²≈0.58 in buckets 2:40–3:15. This is the primary finding and requires no group classification at all.
6 — Segment boundaries
Seg.1=0–10K, Seg.2=10–25K, Seg.3=25–35K, Seg.4=35K–Finish. Boundaries at BAA chip checkpoints. Buckets: ±2:30 window for 5-min intervals up to 4:00; ±5 min above 4:00.