Stats

Return-series inference, implied vol, prediction-market calibration — one-line statistical helpers

horizon.stats ships the inferential statistics a fiduciary actually needs to evaluate strategies and quotes. Every metric that has a sampling distribution comes with a bootstrap confidence interval; every test that produces a p-value does.

Pure functions on plain Sequence[float] / numpy arrays. No pandas required. scipy and py_vollib are detected at runtime and used when present; pure-numpy fallbacks ship for everything that matters, so the module works without optional installs.

python

import horizon as hz

# Re-export pattern; equivalent forms:
hz.stats.summary(returns)
from horizon.stats import summary
from horizon.stats.returns import summary

At a glance

Returns inference Sharpe / Sortino with bootstrap CIs, Jarque-Bera, Ljung-Box, max-drawdown stats.

Implied vol Black-Scholes IV solver, IV rank, IV percentile, IV-HV spread, term-structure slope.

Prediction markets Brier, log loss, reliability diagram, Kelly fraction, convergence score.

Returns inference

`summary(returns, *, risk_free_rate=0.0, periods_per_year=252)`

One-shot tearsheet. Returns a ReturnsSummary with n, mean_per_period, vol_per_period, annualized_return, annualized_vol, sharpe, sortino, max_drawdown, skew, excess_kurtosis.

python

rets = hz.stats.returns_from_prices(closes)
print(hz.stats.summary(rets))
# ReturnsSummary(n=251, ann_ret=11.47%, ann_vol=19.38%,
#                sharpe=0.56, sortino=0.58, max_dd=-15.09%)

`sharpe(returns, *, risk_free_rate=0.0, periods_per_year=252, annualize=True)` → `float`

`sortino(returns, *, target=0.0, periods_per_year=252, annualize=True)` → `float`

Point estimates. Returns nan for zero-variance input. Use the *_ci variants whenever you’d quote the number to anyone.

`sharpe_ci(returns, *, confidence=0.95, n_resamples=5000, seed=None, ...)` → `ConfidenceInterval`

`sortino_ci(returns, *, confidence=0.95, n_resamples=5000, seed=None, ...)` → `ConfidenceInterval`

Bootstrap percentile CIs. iid resampling with replacement; pair with ljung_box to confirm iid-ness first (autocorrelated series want a stationary bootstrap, not the default).

python

ci = hz.stats.sharpe_ci(rets, n_resamples=5000, seed=0)
# CI(0.5605, [-1.4679, 2.5510], 95%, n=5000)

A wide CI on a strong-looking Sharpe is the most common reason a backtest’s edge fails to replicate in production — quote the band, not the midpoint.

`jarque_bera(returns)` → `JarqueBeraResult`

Null: returns are normally distributed. Reject when p_value < alpha. Uses scipy.stats.chi2.sf if scipy is installed; otherwise a Numerical-Recipes continued-fraction implementation of the chi-squared survival function.

python

jb = hz.stats.jarque_bera(rets)
print(f"skew={jb.skew:.3f} kurtosis={jb.excess_kurtosis:.3f} p={jb.p_value:.4f}")

`ljung_box(returns, *, lags=10)` → `LjungBoxResult`

Null: returns are iid (no autocorrelation at lags 1..h). Reject when p_value < alpha. Run this before sharpe_ci — the iid bootstrap is only valid when serial correlation is weak.

`max_drawdown(returns)` → `DrawdownStats`

Peak-to-trough drawdown on the cumulative-return curve, plus the peak/trough/recovery indices and durations. recovery_index is None if the equity curve hasn’t surpassed the prior peak by the end of the series.

`returns_from_prices(prices)` → `numpy.ndarray`

Simple arithmetic returns from a price series. Length N → N-1.

Implied volatility

`BSInputs(spot, strike, time_to_expiry, *, risk_free_rate=0.0, dividend_yield=0.0, option_type="call")`

Inputs for a Black-Scholes-Merton extraction. time_to_expiry is in years (use dte / 365.25 or dte / 252 depending on convention). Validation: rejects non-positive spot/strike/T, rejects bad option_type.

`implied_vol(premium, inputs, *, tol=1e-8, max_iter=100, use_vollib=True)` → `IVResult`

Solves for the IV under BSM. Delegates to py_vollib when installed (faster); falls back to bisection on [1e-4, 5.0]. Refuses to solve if premium is below intrinsic value (the market is showing an arbitrage and forcing a number from a bad quote is a fiduciary hazard).

python

inputs = hz.stats.BSInputs(
    spot=180.0, strike=185.0, time_to_expiry=30/365,
    risk_free_rate=0.045, option_type="call",
)
r = hz.stats.implied_vol(premium=2.85, inputs=inputs)
print(r.iv, r.converged, r.iterations)

`iv_rank(current_iv, iv_history)` → `float`

Linear position in [0, 1] of current_iv between the 52-week (or whatever lookback the history is) min and max. nan if min == max.

`iv_percentile(current_iv, iv_history)` → `float`

Fraction of iv_history strictly less than current_iv. Note: this is not the same as iv_rank. The desk uses both.

`iv_hv_spread(implied, realized)` → `float`

implied − realized. Positive → options are “rich” vs recent realized; negative → “cheap”. Pair with realized_vol_annualized.

`realized_vol_annualized(prices, *, periods_per_year=252)` → `float`

Annualized standard deviation of log returns on a price series. For a rolling window, prefer horizon.features.equity.RealizedVol.

`iv_term_structure_slope(dtes, ivs)` → `float`

OLS slope of IV vs DTE. Positive = contango, negative = backwardation. nan if x collapses to a single value.

Prediction-market calibration

`brier_score(predictions, outcomes)` → `float`

Mean squared error between forecast and outcome. Lower is better. Perfectly calibrated and sharp → 0; uninformative prior of 0.5 on a balanced binary → 0.25.

`log_loss(predictions, outcomes, *, eps=1e-12)` → `float`

Binary cross-entropy. Proper scoring rule — preferred over Brier for distinguishing well-calibrated forecasters at the tails. Clipped at eps so a single dead-wrong prediction doesn’t blow up to inf.

`reliability_diagram(predictions, outcomes, *, n_bins=10)` → `ReliabilityDiagram`

Bucketed calibration diagnostic. miscalibration_area is the support-weighted L1 deviation from the identity line — one scalar you can trend over time.

`calibration_report(predictions, outcomes, *, n_bins=10)` → `CalibrationReport`

Brier + log loss + reliability diagram in one immutable struct. The shape you put in a tearsheet header.

python

rep = hz.stats.calibration_report(predictions, outcomes, n_bins=10)
print(rep.brier, rep.log_loss, rep.reliability.miscalibration_area)

`implied_probability(price, *, min_price=0.01, max_price=0.99)` → `float`

Clamped read of a binary-market YES price as P(YES). The clamp protects against on-the-edge quotes that would otherwise make Kelly blow up.

`kelly_fraction(p_yes, yes_price, *, cap=0.5)` → `float`

Signed Kelly fraction. Positive → YES side, negative → NO side, zero → no edge. cap defaults to half-Kelly (0.5), widely considered the prudent ceiling for a real book; pass cap=None for raw Kelly.

python

# Forecast 60% YES at a 55% market price
f = hz.stats.kelly_fraction(p_yes=0.60, yes_price=0.55)
# 0.111... = (0.60 - 0.55) / (1 - 0.55)

`convergence_score(yes_prices, outcome, *, weight="uniform")` → `float`

“Did this market actually price the outcome?” 1.0 means the market was always right, 0.5 is a coin flip, 0.0 means the market was always wrong. Use weight="linear" to weight ticks closer to resolution more heavily.

End-to-end example

python

import horizon as hz

# 1) Pull data with whichever provider
bars = list(hz.data.providers.alpaca(["AAPL"], start="2024-01-01").iter_bars())
prices = [b.close for b in bars]

# 2) One-line research workflow
rets = hz.stats.returns_from_prices(prices)
print(hz.stats.summary(rets))
print("Sharpe CI:", hz.stats.sharpe_ci(rets, seed=0))
print("Normality :", hz.stats.jarque_bera(rets))
print("IID check :", hz.stats.ljung_box(rets, lags=10))

For IV-on-real-quotes, see docs/venues/alpaca.mdx — Alpaca.options_chain pulls the snapshot, hz.stats.implied_vol extracts IV from a premium without an embedded Greek field, and hz.stats.iv_rank ranks it against history.

Result types

Type	Fields
`ReturnsSummary`	`n, mean_per_period, vol_per_period, annualized_return, annualized_vol, sharpe, sortino, max_drawdown, skew, excess_kurtosis, periods_per_year`
`ConfidenceInterval`	`point_estimate, lower, upper, confidence, n_resamples`
`JarqueBeraResult`	`statistic, p_value, skew, excess_kurtosis`
`LjungBoxResult`	`statistic, p_value, lags, autocorrelations`
`DrawdownStats`	`max_drawdown, peak_index, trough_index, recovery_index, duration_periods, recovery_periods`
`IVResult`	`iv, inputs, premium, iterations, converged`
`BSInputs`	`spot, strike, time_to_expiry, risk_free_rate, dividend_yield, option_type`
`ReliabilityDiagram`	`bin_edges, bin_centers, mean_predicted, observed_frequency, counts, miscalibration_area`
`CalibrationReport`	`n, brier, log_loss, reliability`

All result types are frozen dataclasses — immutable, hashable where possible, safe to pickle into a tearsheet or audit log.

Dependencies

Function	Optional dep	Behaviour without it
`jarque_bera`, `ljung_box`	`scipy`	Uses Numerical-Recipes chi-squared survival; results within numerical tolerance of scipy’s
`implied_vol`	`py_vollib`	Pure-numpy bisection on `[1e-4, 5.0]`; ~50 iterations to 1e-8
everything else	none	Pure numpy + stdlib

Install pip install -e ".[validate]" for scipy + sklearn, or pip install -e ".[options]" for py_vollib — both are recommended for production but not required.

At a glance

Returns inference

summary(returns, *, risk_free_rate=0.0, periods_per_year=252)

sharpe(returns, *, risk_free_rate=0.0, periods_per_year=252, annualize=True) → float

sortino(returns, *, target=0.0, periods_per_year=252, annualize=True) → float

sharpe_ci(returns, *, confidence=0.95, n_resamples=5000, seed=None, ...) → ConfidenceInterval

sortino_ci(returns, *, confidence=0.95, n_resamples=5000, seed=None, ...) → ConfidenceInterval

jarque_bera(returns) → JarqueBeraResult

ljung_box(returns, *, lags=10) → LjungBoxResult

max_drawdown(returns) → DrawdownStats

returns_from_prices(prices) → numpy.ndarray

Implied volatility

BSInputs(spot, strike, time_to_expiry, *, risk_free_rate=0.0, dividend_yield=0.0, option_type="call")

implied_vol(premium, inputs, *, tol=1e-8, max_iter=100, use_vollib=True) → IVResult

iv_rank(current_iv, iv_history) → float

iv_percentile(current_iv, iv_history) → float

iv_hv_spread(implied, realized) → float

realized_vol_annualized(prices, *, periods_per_year=252) → float

iv_term_structure_slope(dtes, ivs) → float

Prediction-market calibration

brier_score(predictions, outcomes) → float

log_loss(predictions, outcomes, *, eps=1e-12) → float

reliability_diagram(predictions, outcomes, *, n_bins=10) → ReliabilityDiagram

calibration_report(predictions, outcomes, *, n_bins=10) → CalibrationReport

implied_probability(price, *, min_price=0.01, max_price=0.99) → float

kelly_fraction(p_yes, yes_price, *, cap=0.5) → float

convergence_score(yes_prices, outcome, *, weight="uniform") → float

End-to-end example

Result types

Dependencies

`summary(returns, *, risk_free_rate=0.0, periods_per_year=252)`

`sharpe(returns, *, risk_free_rate=0.0, periods_per_year=252, annualize=True)` → `float`

`sortino(returns, *, target=0.0, periods_per_year=252, annualize=True)` → `float`

`sharpe_ci(returns, *, confidence=0.95, n_resamples=5000, seed=None, ...)` → `ConfidenceInterval`

`sortino_ci(returns, *, confidence=0.95, n_resamples=5000, seed=None, ...)` → `ConfidenceInterval`

`jarque_bera(returns)` → `JarqueBeraResult`

`ljung_box(returns, *, lags=10)` → `LjungBoxResult`

`max_drawdown(returns)` → `DrawdownStats`

`returns_from_prices(prices)` → `numpy.ndarray`

`BSInputs(spot, strike, time_to_expiry, *, risk_free_rate=0.0, dividend_yield=0.0, option_type="call")`

`implied_vol(premium, inputs, *, tol=1e-8, max_iter=100, use_vollib=True)` → `IVResult`

`iv_rank(current_iv, iv_history)` → `float`

`iv_percentile(current_iv, iv_history)` → `float`

`iv_hv_spread(implied, realized)` → `float`

`realized_vol_annualized(prices, *, periods_per_year=252)` → `float`

`iv_term_structure_slope(dtes, ivs)` → `float`

`brier_score(predictions, outcomes)` → `float`

`log_loss(predictions, outcomes, *, eps=1e-12)` → `float`

`reliability_diagram(predictions, outcomes, *, n_bins=10)` → `ReliabilityDiagram`

`calibration_report(predictions, outcomes, *, n_bins=10)` → `CalibrationReport`

`implied_probability(price, *, min_price=0.01, max_price=0.99)` → `float`

`kelly_fraction(p_yes, yes_price, *, cap=0.5)` → `float`

`convergence_score(yes_prices, outcome, *, weight="uniform")` → `float`