Stats
Return-series inference, implied vol, prediction-market calibration — one-line statistical helpers
horizon.stats ships the inferential statistics a fiduciary actually
needs to evaluate strategies and quotes. Every metric that has a sampling
distribution comes with a bootstrap confidence interval; every test that
produces a p-value does.
Pure functions on plain Sequence[float] / numpy arrays. No pandas
required. scipy and py_vollib are detected at runtime and used when
present; pure-numpy fallbacks ship for everything that matters, so the
module works without optional installs.
import horizon as hz
# Re-export pattern; equivalent forms:
hz.stats.summary(returns)
from horizon.stats import summary
from horizon.stats.returns import summary
At a glance
Returns inference
summary(returns, *, risk_free_rate=0.0, periods_per_year=252)
One-shot tearsheet. Returns a ReturnsSummary with n, mean_per_period, vol_per_period, annualized_return, annualized_vol, sharpe, sortino, max_drawdown, skew, excess_kurtosis.
rets = hz.stats.returns_from_prices(closes)
print(hz.stats.summary(rets))
# ReturnsSummary(n=251, ann_ret=11.47%, ann_vol=19.38%,
# sharpe=0.56, sortino=0.58, max_dd=-15.09%)
sharpe(returns, *, risk_free_rate=0.0, periods_per_year=252, annualize=True) → float
sortino(returns, *, target=0.0, periods_per_year=252, annualize=True) → float
Point estimates. Returns nan for zero-variance input. Use the *_ci variants whenever you’d quote the number to anyone.
sharpe_ci(returns, *, confidence=0.95, n_resamples=5000, seed=None, ...) → ConfidenceInterval
sortino_ci(returns, *, confidence=0.95, n_resamples=5000, seed=None, ...) → ConfidenceInterval
Bootstrap percentile CIs. iid resampling with replacement; pair with ljung_box to confirm iid-ness first (autocorrelated series want a
stationary bootstrap, not the default).
ci = hz.stats.sharpe_ci(rets, n_resamples=5000, seed=0)
# CI(0.5605, [-1.4679, 2.5510], 95%, n=5000)
A wide CI on a strong-looking Sharpe is the most common reason a backtest’s edge fails to replicate in production — quote the band, not the midpoint.
jarque_bera(returns) → JarqueBeraResult
Null: returns are normally distributed. Reject when p_value < alpha.
Uses scipy.stats.chi2.sf if scipy is installed; otherwise a
Numerical-Recipes continued-fraction implementation of the chi-squared
survival function.
jb = hz.stats.jarque_bera(rets)
print(f"skew={jb.skew:.3f} kurtosis={jb.excess_kurtosis:.3f} p={jb.p_value:.4f}")
ljung_box(returns, *, lags=10) → LjungBoxResult
Null: returns are iid (no autocorrelation at lags 1..h). Reject when p_value < alpha. Run this before sharpe_ci — the iid bootstrap is
only valid when serial correlation is weak.
max_drawdown(returns) → DrawdownStats
Peak-to-trough drawdown on the cumulative-return curve, plus the
peak/trough/recovery indices and durations. recovery_index is None if the equity curve hasn’t surpassed the prior peak by the end
of the series.
returns_from_prices(prices) → numpy.ndarray
Simple arithmetic returns from a price series. Length N → N-1.
Implied volatility
BSInputs(spot, strike, time_to_expiry, *, risk_free_rate=0.0, dividend_yield=0.0, option_type="call")
Inputs for a Black-Scholes-Merton extraction. time_to_expiry is in
years (use dte / 365.25 or dte / 252 depending on convention).
Validation: rejects non-positive spot/strike/T, rejects bad option_type.
implied_vol(premium, inputs, *, tol=1e-8, max_iter=100, use_vollib=True) → IVResult
Solves for the IV under BSM. Delegates to py_vollib when installed
(faster); falls back to bisection on [1e-4, 5.0]. Refuses to solve
if premium is below intrinsic value (the market is showing an
arbitrage and forcing a number from a bad quote is a fiduciary hazard).
inputs = hz.stats.BSInputs(
spot=180.0, strike=185.0, time_to_expiry=30/365,
risk_free_rate=0.045, option_type="call",
)
r = hz.stats.implied_vol(premium=2.85, inputs=inputs)
print(r.iv, r.converged, r.iterations)
iv_rank(current_iv, iv_history) → float
Linear position in [0, 1] of current_iv between the 52-week (or
whatever lookback the history is) min and max. nan if min == max.
iv_percentile(current_iv, iv_history) → float
Fraction of iv_history strictly less than current_iv. Note: this
is not the same as iv_rank. The desk uses both.
iv_hv_spread(implied, realized) → float
implied − realized. Positive → options are “rich” vs recent
realized; negative → “cheap”. Pair with realized_vol_annualized.
realized_vol_annualized(prices, *, periods_per_year=252) → float
Annualized standard deviation of log returns on a price series. For a
rolling window, prefer horizon.features.equity.RealizedVol.
iv_term_structure_slope(dtes, ivs) → float
OLS slope of IV vs DTE. Positive = contango, negative =
backwardation. nan if x collapses to a single value.
Prediction-market calibration
brier_score(predictions, outcomes) → float
Mean squared error between forecast and outcome. Lower is better. Perfectly calibrated and sharp → 0; uninformative prior of 0.5 on a balanced binary → 0.25.
log_loss(predictions, outcomes, *, eps=1e-12) → float
Binary cross-entropy. Proper scoring rule — preferred over Brier for
distinguishing well-calibrated forecasters at the tails. Clipped at eps so a single dead-wrong prediction doesn’t blow up to inf.
reliability_diagram(predictions, outcomes, *, n_bins=10) → ReliabilityDiagram
Bucketed calibration diagnostic. miscalibration_area is the
support-weighted L1 deviation from the identity line — one scalar
you can trend over time.
calibration_report(predictions, outcomes, *, n_bins=10) → CalibrationReport
Brier + log loss + reliability diagram in one immutable struct. The shape you put in a tearsheet header.
rep = hz.stats.calibration_report(predictions, outcomes, n_bins=10)
print(rep.brier, rep.log_loss, rep.reliability.miscalibration_area)
implied_probability(price, *, min_price=0.01, max_price=0.99) → float
Clamped read of a binary-market YES price as P(YES). The clamp
protects against on-the-edge quotes that would otherwise make Kelly
blow up.
kelly_fraction(p_yes, yes_price, *, cap=0.5) → float
Signed Kelly fraction. Positive → YES side, negative → NO side, zero
→ no edge. cap defaults to half-Kelly (0.5), widely considered the
prudent ceiling for a real book; pass cap=None for raw Kelly.
# Forecast 60% YES at a 55% market price
f = hz.stats.kelly_fraction(p_yes=0.60, yes_price=0.55)
# 0.111... = (0.60 - 0.55) / (1 - 0.55)
convergence_score(yes_prices, outcome, *, weight="uniform") → float
“Did this market actually price the outcome?” 1.0 means the market
was always right, 0.5 is a coin flip, 0.0 means the market was always
wrong. Use weight="linear" to weight ticks closer to resolution
more heavily.
End-to-end example
import horizon as hz
# 1) Pull data with whichever provider
bars = list(hz.data.providers.alpaca(["AAPL"], start="2024-01-01").iter_bars())
prices = [b.close for b in bars]
# 2) One-line research workflow
rets = hz.stats.returns_from_prices(prices)
print(hz.stats.summary(rets))
print("Sharpe CI:", hz.stats.sharpe_ci(rets, seed=0))
print("Normality :", hz.stats.jarque_bera(rets))
print("IID check :", hz.stats.ljung_box(rets, lags=10))
For IV-on-real-quotes, see docs/venues/alpaca.mdx — Alpaca.options_chain pulls the snapshot, hz.stats.implied_vol extracts IV from a premium without an embedded Greek field, and hz.stats.iv_rank ranks it against history.
Result types
| Type | Fields |
|---|---|
ReturnsSummary | n, mean_per_period, vol_per_period, annualized_return, annualized_vol, sharpe, sortino, max_drawdown, skew, excess_kurtosis, periods_per_year |
ConfidenceInterval | point_estimate, lower, upper, confidence, n_resamples |
JarqueBeraResult | statistic, p_value, skew, excess_kurtosis |
LjungBoxResult | statistic, p_value, lags, autocorrelations |
DrawdownStats | max_drawdown, peak_index, trough_index, recovery_index, duration_periods, recovery_periods |
IVResult | iv, inputs, premium, iterations, converged |
BSInputs | spot, strike, time_to_expiry, risk_free_rate, dividend_yield, option_type |
ReliabilityDiagram | bin_edges, bin_centers, mean_predicted, observed_frequency, counts, miscalibration_area |
CalibrationReport | n, brier, log_loss, reliability |
All result types are frozen dataclasses — immutable, hashable where possible, safe to pickle into a tearsheet or audit log.
Dependencies
| Function | Optional dep | Behaviour without it |
|---|---|---|
jarque_bera, ljung_box | scipy | Uses Numerical-Recipes chi-squared survival; results within numerical tolerance of scipy’s |
implied_vol | py_vollib | Pure-numpy bisection on [1e-4, 5.0]; ~50 iterations to 1e-8 |
| everything else | none | Pure numpy + stdlib |
Install pip install -e ".[validate]" for scipy + sklearn, or pip install -e ".[options]" for py_vollib — both are recommended
for production but not required.