Metrics Math

How Sharpe, Sortino, CAGR, and max drawdown are computed

Horizon’s MetricsCollector accumulates per-tick snapshots during the backtest and computes summary metrics at the end. This page documents the exact formulas.

Per-tick snapshot

Every tick, MetricsCollector.snapshot() records:

python
@dataclass
class MetricsSnapshot:
    timestamp: datetime
    equity: float
    cash: float
    unrealized_pnl: float
    realized_pnl_cum: float
    gross_notional: float
    net_notional: float
    n_positions: int
    drawdown_pct: float
    peak_equity: float

Computed on the fly:

  • peak_equity: running maximum of equity
  • drawdown_pct. (peak_equity - equity) / peak_equity (clamped to ≥0)

Returns

Period-over-period returns are computed from the equity curve:

python
def returns(self) -> list[float]:
    if len(self.snapshots) < 2:
        return []
    rets = []
    for prev, cur in zip(self.snapshots, self.snapshots[1:]):
        if prev.equity <= 0:
            rets.append(0.0)
            continue
        rets.append((cur.equity - prev.equity) / prev.equity)
    return rets

Simple returns (not log returns). Each return is (equity_t - equity_{t-1}) / equity_{t-1}.

Sharpe ratio

python
mean = sum(rets) / len(rets)
var = sum((r - mean) ** 2 for r in rets) / max(len(rets) - 1, 1)
std = √var
sharpe = (mean / std) * √periods_per_year
  • periods_per_year defaults to 252 (equity trading days)
  • For crypto (24/7), use 365
  • For monthly bars, use 12
  • For hourly bars during NYSE hours, use 252 × 6.5 = 1638

Set via MetricsCollector(periods_per_year=...).

Sortino ratio

Same as Sharpe but denominator uses downside-only volatility:

python
downside = [r for r in rets if r < 0]
if downside:
    dvar = sum(r * r for r in downside) / len(rets)    # denominator is ALL periods, not just downside
    dstd = √dvar
    sortino = (mean / dstd) * √periods_per_year

Sortino penalizes only downside volatility. upside volatility is “good” volatility and shouldn’t count against the strategy. For strategies with skewed upside, Sortino is more informative than Sharpe.

The denominator uses len(returns) (not len(downside)). This is the standard Sortino definition from Sortino & Van der Meer (1991). Using len(downside) is a common bug that inflates the ratio.

CAGR

python
years = n_ticks / periods_per_year
cagr = (equity_end / equity_start) ** (1 / years) - 1

Compounded annual growth rate. The geometric return equivalent to what the strategy achieved.

Example:

  • Start: $100,000
  • End: $130,000 after 504 bars (≈ 2 years)
  • CAGR = (130,000 / 100,000)^(1/2) - 1 = 0.1402 = 14.02% per year

Max drawdown

python
self._peak_equity = 0
for snap in snapshots:
    if snap.equity > self._peak_equity:
        self._peak_equity = snap.equity
    dd = (self._peak_equity - snap.equity) / self._peak_equity
    self._max_dd_pct = max(self._max_dd_pct, dd)

Drawdown is always non-negative. Reported as a decimal fraction. 0.12 = 12% drawdown.

Max drawdown duration

python
self._peak_index = index_of_peak
self._max_dd_duration = current_index - peak_index   # when new max dd is set

Counted in bars (snapshots), not wall-clock time. For a daily backtest, max_dd_duration_bars = 60 means 60 trading days.

Summary dataclass

python
@dataclass
class RunMetrics:
    equity_start: float
    equity_end: float
    peak_equity: float
    total_return: float                  # (end - start) / start
    cagr: float
    sharpe: float
    sortino: float
    max_drawdown: float                   # non-negative decimal
    max_drawdown_duration_bars: int
    n_ticks: int
    n_trades: int
    fees_paid: float
    hit_rate: float
    avg_trade_pnl: float

BacktestResult exposes a subset

Horizon’s BacktestResult exposes the most commonly used metrics:

python
result = hz.run(mode="backtest", ...)
result.sharpe           # float
result.sortino          # float
result.total_return     # float (decimal)
result.max_drawdown     # float (decimal, non-negative)
result.n_trades         # int
result.equity_curve     # list[tuple[datetime, float]]
result.trades           # list[TradeRecord]

For deeper metrics (CAGR, duration, etc.), access the underlying MetricsCollector:

python
# Note: in the current implementation, the metrics collector isn't
# exposed on BacktestResult directly. For advanced metrics, compute
# them from the equity curve manually.

Computing additional metrics from the equity curve

python
import math

equity = [e for _, e in result.equity_curve]

# Returns
returns = [(equity[i] - equity[i-1]) / equity[i-1] for i in range(1, len(equity)) if equity[i-1] > 0]

# CAGR (assumes daily bars)
n_bars = len(equity)
years = n_bars / 252
cagr = (equity[-1] / equity[0]) ** (1 / years) - 1

# Calmar
calmar = cagr / abs(result.max_drawdown) if result.max_drawdown > 0 else 0

# Skew & kurtosis (needs scipy)
try:
    import scipy.stats
    skew = scipy.stats.skew(returns)
    kurt = scipy.stats.kurtosis(returns)
except ImportError:
    skew = kurt = float("nan")

print(f"CAGR:     {cagr:+.2%}")
print(f"Calmar:   {calmar:.2f}")
print(f"Skew:     {skew:+.3f}")
print(f"Kurtosis: {kurt:+.3f}")

Using quantstats for richer metrics

With pip install quantstats:

python
import quantstats as qs
import pandas as pd

returns_series = pd.Series(
    [(e - equity[0]) / equity[0] for e in equity],
    index=[t for t, _ in result.equity_curve],
)
qs.reports.html(returns_series, output="report.html")

Gives you a full tearsheet with ~30 institutional-grade metrics.

Deflated Sharpe Ratio

For research-grade Sharpe correction (accounting for multiple testing, skew, kurtosis, sample size), see de Prado Validation. The native Sharpe ratio is the right starting point; DSR is the rigor you apply before deploying.

Next