Runs Under Model — Methodology

How the GameDay Analytics runs-under model predicts the probability that a batter scores zero runs in a game — the Under 0.5 runs prop. Live values (active version, metrics, settings) are read directly from the model database; the surrounding narrative describes the fixed pipeline and decision logic.
{ } View as JSON ↗ Same content, machine-readable.

1. What the model predicts

For each projected starting hitter the model estimates a single number: P(zero runs) — the probability the player scores zero runs in the game. That is exactly the Under 0.5 runs side of the batter-runs prop.

It is a binary classification problem. The training target is scored_zero ∈ {0, 1}: 1 means the player did not score (the Under wins), 0 means they scored one or more (the Under loses). The product of the model is a single calibrated probability in [0, 1], which is what every downstream edge calculation compares against the book.

2. Active model

Version tag
v1.0-lgbm active
Trained at
2026-06-27 21:22:18
Algorithm
LightGBM + isotonic
Target
scored_zero (0/1)
Brier
0.2245
Log loss
0.6436
AUC
0.5781
Base zero rate
64.8%
Test rows
3,733

Recent model versions

VersionTrained atActiveBrierLog lossAUC
v1.0-lgbm 2026-06-27 21:22:18 active 0.2245 0.6436 0.5781

3. Feature groups

The probability blends five weighted groups of inputs, all computed from the current-season (2026) context at prediction time. Each group can be scaled independently via model settings.

on_base 6 features

The batter's own propensity to reach base and score — season and recent on-base rate, runs-scored rate, plate-appearance volume. The dominant driver of whether they score at all.

Member features
player_season_obp
player_l7_obp
player_l15_obp
player_l15_reach_rate
player_season_k_pct
player_season_bb_pct

lineup_context 3 features

Where the batter hits and the run-scoring context around them — batting slot, projected plate appearances, and the ability of the hitters behind them to drive them in.

Member features
batting_slot
behind_obp
behind_slg

pitcher_suppression 6 features

The opposing starter's ability to suppress baserunners and runs — strikeout rate, WHIP, ERA, and Statcast suppression signals.

Member features
sp_obp_against
sp_k_pct
sp_runners_on_rate
sp_xwoba_against
sp_hardhit_pct_against
sp_barrel_pct_against

environment 4 features

Park and game-environment factors: venue run environment, home/away, day/night, and park factors that shift offensive output.

Member features
park_run_factor
is_home
is_night
temperature

team_offense 2 features

The batter's team offensive strength and the opponent's run prevention — team runs per game, recent form, and opposing-staff context.

Member features
team_runs_per_game
implied_team_total

4. Calibration

The core learner is a LightGBM gradient-boosted classifier. Its raw scores rank player-games well but are not honest probabilities. So the raw scores are passed through an isotonic regression fit on a held-out slice of the data. Isotonic calibration is a monotonic, non-parametric mapping: it preserves the ranking while bending the curve so that a published 70% really does cash about 70% of the time.

The result is a trustworthy probability — the only kind that can be fairly compared against a sportsbook's implied probability. Reliability is monitored on the Performance page via a 10-bucket decile table (model probability vs observed zero-run rate).

Held-out metrics for the active version: Brier 0.2245 · log-loss 0.6436 · AUC 0.5781.

5. Edge & filtering

The sportsbook's Under odds are converted to an implied probability with the vig removed (de-vigged). The model's edge is the difference, in percentage points:

edge = model P(zero) − de-vigged implied P(under)

A player-game is surfaced as an UNDER pick only when both gates clear:

  • The Under odds are short enough — at or below -160 (the configured odds_threshold) — so the market itself favors the Under.
  • The edge clears the configured minimum of 3.0 pts (min_edge_threshold).

The model is UNDER-only: it is trained on the zero-runs event, and the asymmetric 0.5 line makes the Under the only directionally-meaningful side. Anything that fails either gate is recorded as no_pick. Thresholds live in ru_model_settings (append-only — every change inserts a new row).

6. Lock & grading

Predictions update through the day as lineups and lines move, then lock shortly before first pitch on a confirmed lineup. A pick is voided if the player's lineup spot is never confirmed, the player is scratched, or they end up a DNP — voids are zero units and are excluded from the bankroll record. Once the game is final, each pick is graded against whether the player actually scored:

ResultConditionUnits
winPlayer scored zero runs (scored_zero=1). The Under cleared.+unit × (odds payout factor)
lossPlayer scored one or more runs (scored_zero=0). The Under missed.−unit
voidUnconfirmed lineup, scratch, DNP, or game postponed/cancelled.0.00

7. Sufficiency gate

Recommendations stay hidden on the public slate until the model has proven itself on a live sample. Both conditions must hold before picks publish:

  • At least 150 graded picks (sufficiency_min_graded).
  • A live Brier score at or below 0.18 (sufficiency_brier_ceiling).

published = (graded_count ≥ min_graded) AND (brier_live ≤ brier_ceiling)

Until the gate clears, the Today and Performance pages show probabilities and lines but no UNDER call-to-action, alongside a yellow “Building sample” banner that reports progress.

Note: brier_live is mean((model_prob − scored_zero)²) over graded UNDER picks, and is null until at least one pick is graded.

8. Limitations

  • Single book — lines and odds are read from one sportsbook; the model does not shop the line across books for the best available Under price.
  • No platoon-specific behind-hitters split — the lineup-context group captures who bats behind the hitter, but not the platoon-specific (vs-LHP / vs-RHP) production of those run producers against the listed starter.
  • Weather is best-effort / often unavailable — temperature, wind, and humidity are not reliably in the feature set; the environment group leans on venue and park factors.

9. Live settings

All values below are read live from ru_model_settings (append-only — the most recent row per key wins).

KeyValueLast updated
default_book Hard Rock Bet 2026-06-27 02:40:49
feature_weight_overrides {"on_base":1,"lineup_context":1,"pitcher_suppression":1,"environment":1,"team_offense":1} 2026-06-27 02:40:49
min_edge_threshold 3 2026-06-27 02:40:49
odds_threshold -160 2026-06-27 02:40:49
poll_lead_minutes 120 2026-06-27 02:40:49
sufficiency_brier_ceiling 0.18 2026-06-27 02:40:49
sufficiency_min_graded 150 2026-06-27 02:40:49
unit_size 25 2026-06-27 02:40:49

All numeric values above are live reads from the model database.