Metrics Math
How Sharpe, Sortino, CAGR, and max drawdown are computed
Horizon’s MetricsCollector accumulates per-tick snapshots during the backtest and computes summary metrics at the end. This page documents the exact formulas.
Per-tick snapshot
Every tick, MetricsCollector.snapshot() records:
@dataclass
class MetricsSnapshot:
timestamp: datetime
equity: float
cash: float
unrealized_pnl: float
realized_pnl_cum: float
gross_notional: float
net_notional: float
n_positions: int
drawdown_pct: float
peak_equity: float
Computed on the fly:
peak_equity: running maximum ofequitydrawdown_pct.(peak_equity - equity) / peak_equity(clamped to ≥0)
Returns
Period-over-period returns are computed from the equity curve:
def returns(self) -> list[float]:
if len(self.snapshots) < 2:
return []
rets = []
for prev, cur in zip(self.snapshots, self.snapshots[1:]):
if prev.equity <= 0:
rets.append(0.0)
continue
rets.append((cur.equity - prev.equity) / prev.equity)
return rets
Simple returns (not log returns). Each return is (equity_t - equity_{t-1}) / equity_{t-1}.
Sharpe ratio
mean = sum(rets) / len(rets)
var = sum((r - mean) ** 2 for r in rets) / max(len(rets) - 1, 1)
std = √var
sharpe = (mean / std) * √periods_per_year
periods_per_yeardefaults to 252 (equity trading days)- For crypto (24/7), use 365
- For monthly bars, use 12
- For hourly bars during NYSE hours, use 252 × 6.5 = 1638
Set via MetricsCollector(periods_per_year=...).
Sortino ratio
Same as Sharpe but denominator uses downside-only volatility:
downside = [r for r in rets if r < 0]
if downside:
dvar = sum(r * r for r in downside) / len(rets) # denominator is ALL periods, not just downside
dstd = √dvar
sortino = (mean / dstd) * √periods_per_year
Sortino penalizes only downside volatility. upside volatility is “good” volatility and shouldn’t count against the strategy. For strategies with skewed upside, Sortino is more informative than Sharpe.
The denominator uses len(returns) (not len(downside)). This is the standard Sortino definition from Sortino & Van der Meer (1991). Using len(downside) is a common bug that inflates the ratio.
CAGR
years = n_ticks / periods_per_year
cagr = (equity_end / equity_start) ** (1 / years) - 1
Compounded annual growth rate. The geometric return equivalent to what the strategy achieved.
Example:
- Start: $100,000
- End: $130,000 after 504 bars (≈ 2 years)
- CAGR = (130,000 / 100,000)^(1/2) - 1 = 0.1402 = 14.02% per year
Max drawdown
self._peak_equity = 0
for snap in snapshots:
if snap.equity > self._peak_equity:
self._peak_equity = snap.equity
dd = (self._peak_equity - snap.equity) / self._peak_equity
self._max_dd_pct = max(self._max_dd_pct, dd)
Drawdown is always non-negative. Reported as a decimal fraction. 0.12 = 12% drawdown.
Max drawdown duration
self._peak_index = index_of_peak
self._max_dd_duration = current_index - peak_index # when new max dd is set
Counted in bars (snapshots), not wall-clock time. For a daily backtest, max_dd_duration_bars = 60 means 60 trading days.
Summary dataclass
@dataclass
class RunMetrics:
equity_start: float
equity_end: float
peak_equity: float
total_return: float # (end - start) / start
cagr: float
sharpe: float
sortino: float
max_drawdown: float # non-negative decimal
max_drawdown_duration_bars: int
n_ticks: int
n_trades: int
fees_paid: float
hit_rate: float
avg_trade_pnl: float
BacktestResult exposes a subset
Horizon’s BacktestResult exposes the most commonly used metrics:
result = hz.run(mode="backtest", ...)
result.sharpe # float
result.sortino # float
result.total_return # float (decimal)
result.max_drawdown # float (decimal, non-negative)
result.n_trades # int
result.equity_curve # list[tuple[datetime, float]]
result.trades # list[TradeRecord]
For deeper metrics (CAGR, duration, etc.), access the underlying MetricsCollector:
# Note: in the current implementation, the metrics collector isn't
# exposed on BacktestResult directly. For advanced metrics, compute
# them from the equity curve manually.
Computing additional metrics from the equity curve
import math
equity = [e for _, e in result.equity_curve]
# Returns
returns = [(equity[i] - equity[i-1]) / equity[i-1] for i in range(1, len(equity)) if equity[i-1] > 0]
# CAGR (assumes daily bars)
n_bars = len(equity)
years = n_bars / 252
cagr = (equity[-1] / equity[0]) ** (1 / years) - 1
# Calmar
calmar = cagr / abs(result.max_drawdown) if result.max_drawdown > 0 else 0
# Skew & kurtosis (needs scipy)
try:
import scipy.stats
skew = scipy.stats.skew(returns)
kurt = scipy.stats.kurtosis(returns)
except ImportError:
skew = kurt = float("nan")
print(f"CAGR: {cagr:+.2%}")
print(f"Calmar: {calmar:.2f}")
print(f"Skew: {skew:+.3f}")
print(f"Kurtosis: {kurt:+.3f}")
Using quantstats for richer metrics
With pip install quantstats:
import quantstats as qs
import pandas as pd
returns_series = pd.Series(
[(e - equity[0]) / equity[0] for e in equity],
index=[t for t, _ in result.equity_curve],
)
qs.reports.html(returns_series, output="report.html")
Gives you a full tearsheet with ~30 institutional-grade metrics.
Deflated Sharpe Ratio
For research-grade Sharpe correction (accounting for multiple testing, skew, kurtosis, sample size), see de Prado Validation. The native Sharpe ratio is the right starting point; DSR is the rigor you apply before deploying.