Stats

Return-series inference, implied vol, prediction-market calibration — one-line statistical helpers

horizon.stats ships the inferential statistics a fiduciary actually needs to evaluate strategies and quotes. Every metric that has a sampling distribution comes with a bootstrap confidence interval; every test that produces a p-value does.

Pure functions on plain Sequence[float] / numpy arrays. No pandas required. scipy and py_vollib are detected at runtime and used when present; pure-numpy fallbacks ship for everything that matters, so the module works without optional installs.

python
import horizon as hz

# Re-export pattern; equivalent forms:
hz.stats.summary(returns)
from horizon.stats import summary
from horizon.stats.returns import summary

At a glance

Returns inference Sharpe / Sortino with bootstrap CIs, Jarque-Bera, Ljung-Box, max-drawdown stats.
Implied vol Black-Scholes IV solver, IV rank, IV percentile, IV-HV spread, term-structure slope.
Prediction markets Brier, log loss, reliability diagram, Kelly fraction, convergence score.

Returns inference

summary(returns, *, risk_free_rate=0.0, periods_per_year=252)

One-shot tearsheet. Returns a ReturnsSummary with n, mean_per_period, vol_per_period, annualized_return, annualized_vol, sharpe, sortino, max_drawdown, skew, excess_kurtosis.

python
rets = hz.stats.returns_from_prices(closes)
print(hz.stats.summary(rets))
# ReturnsSummary(n=251, ann_ret=11.47%, ann_vol=19.38%,
#                sharpe=0.56, sortino=0.58, max_dd=-15.09%)

sharpe(returns, *, risk_free_rate=0.0, periods_per_year=252, annualize=True)float

sortino(returns, *, target=0.0, periods_per_year=252, annualize=True)float

Point estimates. Returns nan for zero-variance input. Use the *_ci variants whenever you’d quote the number to anyone.

sharpe_ci(returns, *, confidence=0.95, n_resamples=5000, seed=None, ...)ConfidenceInterval

sortino_ci(returns, *, confidence=0.95, n_resamples=5000, seed=None, ...)ConfidenceInterval

Bootstrap percentile CIs. iid resampling with replacement; pair with ljung_box to confirm iid-ness first (autocorrelated series want a stationary bootstrap, not the default).

python
ci = hz.stats.sharpe_ci(rets, n_resamples=5000, seed=0)
# CI(0.5605, [-1.4679, 2.5510], 95%, n=5000)

A wide CI on a strong-looking Sharpe is the most common reason a backtest’s edge fails to replicate in production — quote the band, not the midpoint.

jarque_bera(returns)JarqueBeraResult

Null: returns are normally distributed. Reject when p_value < alpha. Uses scipy.stats.chi2.sf if scipy is installed; otherwise a Numerical-Recipes continued-fraction implementation of the chi-squared survival function.

python
jb = hz.stats.jarque_bera(rets)
print(f"skew={jb.skew:.3f} kurtosis={jb.excess_kurtosis:.3f} p={jb.p_value:.4f}")

ljung_box(returns, *, lags=10)LjungBoxResult

Null: returns are iid (no autocorrelation at lags 1..h). Reject when p_value < alpha. Run this before sharpe_ci — the iid bootstrap is only valid when serial correlation is weak.

max_drawdown(returns)DrawdownStats

Peak-to-trough drawdown on the cumulative-return curve, plus the peak/trough/recovery indices and durations. recovery_index is None if the equity curve hasn’t surpassed the prior peak by the end of the series.

returns_from_prices(prices)numpy.ndarray

Simple arithmetic returns from a price series. Length N → N-1.

Implied volatility

BSInputs(spot, strike, time_to_expiry, *, risk_free_rate=0.0, dividend_yield=0.0, option_type="call")

Inputs for a Black-Scholes-Merton extraction. time_to_expiry is in years (use dte / 365.25 or dte / 252 depending on convention). Validation: rejects non-positive spot/strike/T, rejects bad option_type.

implied_vol(premium, inputs, *, tol=1e-8, max_iter=100, use_vollib=True)IVResult

Solves for the IV under BSM. Delegates to py_vollib when installed (faster); falls back to bisection on [1e-4, 5.0]. Refuses to solve if premium is below intrinsic value (the market is showing an arbitrage and forcing a number from a bad quote is a fiduciary hazard).

python
inputs = hz.stats.BSInputs(
    spot=180.0, strike=185.0, time_to_expiry=30/365,
    risk_free_rate=0.045, option_type="call",
)
r = hz.stats.implied_vol(premium=2.85, inputs=inputs)
print(r.iv, r.converged, r.iterations)

iv_rank(current_iv, iv_history)float

Linear position in [0, 1] of current_iv between the 52-week (or whatever lookback the history is) min and max. nan if min == max.

iv_percentile(current_iv, iv_history)float

Fraction of iv_history strictly less than current_iv. Note: this is not the same as iv_rank. The desk uses both.

iv_hv_spread(implied, realized)float

implied − realized. Positive → options are “rich” vs recent realized; negative → “cheap”. Pair with realized_vol_annualized.

realized_vol_annualized(prices, *, periods_per_year=252)float

Annualized standard deviation of log returns on a price series. For a rolling window, prefer horizon.features.equity.RealizedVol.

iv_term_structure_slope(dtes, ivs)float

OLS slope of IV vs DTE. Positive = contango, negative = backwardation. nan if x collapses to a single value.

Prediction-market calibration

brier_score(predictions, outcomes)float

Mean squared error between forecast and outcome. Lower is better. Perfectly calibrated and sharp → 0; uninformative prior of 0.5 on a balanced binary → 0.25.

log_loss(predictions, outcomes, *, eps=1e-12)float

Binary cross-entropy. Proper scoring rule — preferred over Brier for distinguishing well-calibrated forecasters at the tails. Clipped at eps so a single dead-wrong prediction doesn’t blow up to inf.

reliability_diagram(predictions, outcomes, *, n_bins=10)ReliabilityDiagram

Bucketed calibration diagnostic. miscalibration_area is the support-weighted L1 deviation from the identity line — one scalar you can trend over time.

calibration_report(predictions, outcomes, *, n_bins=10)CalibrationReport

Brier + log loss + reliability diagram in one immutable struct. The shape you put in a tearsheet header.

python
rep = hz.stats.calibration_report(predictions, outcomes, n_bins=10)
print(rep.brier, rep.log_loss, rep.reliability.miscalibration_area)

implied_probability(price, *, min_price=0.01, max_price=0.99)float

Clamped read of a binary-market YES price as P(YES). The clamp protects against on-the-edge quotes that would otherwise make Kelly blow up.

kelly_fraction(p_yes, yes_price, *, cap=0.5)float

Signed Kelly fraction. Positive → YES side, negative → NO side, zero → no edge. cap defaults to half-Kelly (0.5), widely considered the prudent ceiling for a real book; pass cap=None for raw Kelly.

python
# Forecast 60% YES at a 55% market price
f = hz.stats.kelly_fraction(p_yes=0.60, yes_price=0.55)
# 0.111... = (0.60 - 0.55) / (1 - 0.55)

convergence_score(yes_prices, outcome, *, weight="uniform")float

“Did this market actually price the outcome?” 1.0 means the market was always right, 0.5 is a coin flip, 0.0 means the market was always wrong. Use weight="linear" to weight ticks closer to resolution more heavily.

End-to-end example

python
import horizon as hz

# 1) Pull data with whichever provider
bars = list(hz.data.providers.alpaca(["AAPL"], start="2024-01-01").iter_bars())
prices = [b.close for b in bars]

# 2) One-line research workflow
rets = hz.stats.returns_from_prices(prices)
print(hz.stats.summary(rets))
print("Sharpe CI:", hz.stats.sharpe_ci(rets, seed=0))
print("Normality :", hz.stats.jarque_bera(rets))
print("IID check :", hz.stats.ljung_box(rets, lags=10))

For IV-on-real-quotes, see docs/venues/alpaca.mdxAlpaca.options_chain pulls the snapshot, hz.stats.implied_vol extracts IV from a premium without an embedded Greek field, and hz.stats.iv_rank ranks it against history.

Result types

TypeFields
ReturnsSummaryn, mean_per_period, vol_per_period, annualized_return, annualized_vol, sharpe, sortino, max_drawdown, skew, excess_kurtosis, periods_per_year
ConfidenceIntervalpoint_estimate, lower, upper, confidence, n_resamples
JarqueBeraResultstatistic, p_value, skew, excess_kurtosis
LjungBoxResultstatistic, p_value, lags, autocorrelations
DrawdownStatsmax_drawdown, peak_index, trough_index, recovery_index, duration_periods, recovery_periods
IVResultiv, inputs, premium, iterations, converged
BSInputsspot, strike, time_to_expiry, risk_free_rate, dividend_yield, option_type
ReliabilityDiagrambin_edges, bin_centers, mean_predicted, observed_frequency, counts, miscalibration_area
CalibrationReportn, brier, log_loss, reliability

All result types are frozen dataclasses — immutable, hashable where possible, safe to pickle into a tearsheet or audit log.

Dependencies

FunctionOptional depBehaviour without it
jarque_bera, ljung_boxscipyUses Numerical-Recipes chi-squared survival; results within numerical tolerance of scipy’s
implied_volpy_vollibPure-numpy bisection on [1e-4, 5.0]; ~50 iterations to 1e-8
everything elsenonePure numpy + stdlib

Install pip install -e ".[validate]" for scipy + sklearn, or pip install -e ".[options]" for py_vollib — both are recommended for production but not required.