My first in person conference presentation as a PhD Student

TL;DR

Marathon time prediction remains difficult because many prior models use small, mostly recreational datasets and rely on limited training variables or runner recall.

Briefing Cornell Notes

Briefing

Marathon training advice is often treated like a one-size-fits-all recipe, but predicting how fast a runner will finish has resisted simple solutions—especially for recreational athletes. A new approach uses case-based reasoning on large-scale training histories to forecast a runner’s future marathon time week by week, and early results suggest it can outperform earlier prediction methods by combining what a runner has done in training with what similar runners achieved in past races.

The work starts from a problem in sports science: more than 100 models have been proposed to predict marathon performance, yet none reliably generalize across different runners. A key reason is data limitations—most studies rely on small samples (about 8,500 runners across 85 papers), focus heavily on recreational athletes, and often examine one training variable at a time in controlled settings. Many also depend on runner recall, which is unreliable for detailed, week-to-week training metrics.

To address that gap, the method builds an “extended case-based reasoning” system using a much larger dataset: roughly 160,000 marathon training programs from about 85,000 runners. Training data comes from Strava, including distance–time–elevation time series (and, for some runners, heart rate and cadence, though the current model omits those because they’re not widely available). The system converts raw activity into weekly features—such as total weekly distance, longest run, number of active days, and pace measures including main weekly pace and fast 10K pace—plus cumulative progression features like cumulative average and cumulative best.

Predictions are made at specific points in training. If a runner is, say, eight weeks from race day, the system retrieves only comparable training plans from runners at the same stage, and it filters by sex to reduce noise. It then finds the K most similar past training cases and predicts future marathon time using a weighted average of what those peers ran.

Several model variants are tested. A training-features-only model is compared against a previous-race-time model (predicting future time from prior marathon performance) and against KNN-style race-to-race comparisons. The results show a clear pattern: earlier in training, prior race time can be more informative, but as race day approaches, training features take over and improve accuracy because they reflect the runner’s current form. The best-performing option combines both sources—adding a runner’s previous race time to the training-feature retrieval process—producing the lowest error for both males and females.

The analysis also highlights what matters most. Feature importance testing finds that fast 10K pace is the dominant predictor for both sexes. The model also consistently selects pace-based features week to week, while some distance and activity-day features drop out depending on interactions.

Evaluation so far is offline using 10-fold cross-validation, restricted to runners who completed marathons between three and five hours. The next step is a live user study and a way to make predictions actionable—by showing runners which peers shaped their estimate and enabling “what-if” adjustments (e.g., sliding toward a target time) that would update the neighbor set and associated training implications. The broader goal is not just accuracy, but guidance grounded in comparable training histories—an approach that could extend beyond marathons to other endurance sports like cycling and swimming.

Cornell Notes

A large-scale case-based reasoning model predicts marathon finish time at different points in training by matching a runner’s weekly training profile to similar past runners. Using about 160,000 training programs from roughly 85,000 Strava users, the system converts raw runs into weekly distance and pace features (including fast 10K pace) plus cumulative progression measures, then retrieves the K most similar cases at the same “weeks-to-race” stage. Accuracy improves as race day nears because training features reflect current fitness, while prior marathon time helps more earlier in the cycle. The best results come from combining prior race time with training features. Fast 10K pace emerges as the most important feature, suggesting a practical performance signal that runners may not reliably recall week to week.

Why do traditional marathon prediction models struggle to generalize across runners?

Many published approaches rely on small datasets and narrow experimental designs. Across 85 papers, the total sample is about 8,500 runners, mostly recreational, with limited information on elite athletes. Studies often vary one training variable at a time in controlled environments and frequently depend on runner recall, which makes week-to-week training measurement inconsistent.

How does the case-based reasoning system decide which past runners are “similar” to the current one?

It extracts weekly training features from the current runner’s training program and retrieves only cases at the same stage relative to race day (e.g., eight weeks out). Features include distance-based metrics (total weekly distance, longest run), pace-based metrics (main weekly pace and fast 10K pace), and cumulative progression (cumulative average and cumulative best). It also filters by sex, then predicts future marathon time using a weighted average of the K nearest neighbors’ marathon outcomes.

What does the model learn about the value of prior race time versus training data?

Prior marathon time can outperform training features earlier in the training cycle, but training features become more accurate as race day approaches. The system’s error decreases closer to race day because it has more up-to-date training information. A small error bump around one week from race day is attributed to marathon taper variability—some runners stop completely while others adjust less dramatically.

Which model variant performs best, and how is it constructed?

The combined-features approach performs best for both males and females. It takes the runner’s previous race time and adds it to the training-feature retrieval process, then uses KNN-style neighbor weighting to generate the final prediction. A training-features-only model and a previous-race-time-only model both do worse than this combined setup.

What single training feature turns out to be most important?

Fast 10K pace is the most important feature by far for both males and females. It’s also described as a relatively new feature in sports science literature because it’s difficult to obtain without detailed training data like that available from Strava. The model can’t rely on runner recall for this, which helps explain why it may be missing from simpler approaches.

How is the system evaluated, and what constraints shape the results so far?

Evaluation is offline using 10-fold cross-validation: 90% of training programs serve as the case base and 10% as test cases, rotating folds so error is averaged across splits. The study restricts runners to those who completed marathons between three and five hours, and some model comparisons require runners to have prior marathon races for fairness.

Review Questions

How does restricting retrieval to the same “weeks from race day” change the validity of the neighbor-based prediction?
Why might fast 10K pace be more predictive than weekly distance totals in this framework?
What role does marathon taper likely play in the model’s error pattern near race day?

Key Points

1
Marathon time prediction remains difficult because many prior models use small, mostly recreational datasets and rely on limited training variables or runner recall.
2
An extended case-based reasoning approach uses Strava-derived training histories to predict marathon finish time at specific weeks before race day.
3
Training data is transformed into weekly features (distance, pace, and cumulative progression), including fast 10K pace, then matched to similar past training programs.
4
Prior marathon time helps earlier in training, but training features become more accurate as race day nears; combining both yields the best results.
5
Fast 10K pace is the top-ranked predictor for both males and females in feature-importance tests.
6
Offline evaluation uses 10-fold cross-validation and focuses on runners with marathon times between three and five hours, with online user testing planned next.
7
Future work aims to make predictions interpretable and actionable by showing nearest-neighbor peers and enabling “what-if” target adjustments.

Highlights

Fast 10K pace emerges as the most important feature for marathon prediction for both males and females, and it’s difficult to capture with runner recall alone.

Prediction accuracy improves as race day approaches because training features reflect current fitness, while prior race time is more helpful earlier in the cycle.

The best-performing model combines a runner’s previous marathon time with training-feature retrieval, outperforming either source alone.

A noticeable error increase around one week before race day aligns with marathon taper variability—some runners stop training abruptly while others don’t.

Topics

Marathon Prediction
Case-Based Reasoning
Strava Training Data
Sports Science Modeling
Endurance Performance

Mentioned

Strava

My first in person conference presentation as a PhD Student - ICCBR 2023