Reading Results

How to interpret BacktestResult fields

What to look at first

Sharpe

Is it positive? A negative Sharpe means the strategy lost money on average. Below 0.5 is weak, 0.5-1.0 is OK, 1.0-2.0 is good. Above 2.0 is suspicious. check for overfit.

Max drawdown

How much did the strategy lose from peak to trough? This is the worst historical experience. Double it to get a realistic estimate of the worst you'd live through in the future.

n_trades

Did the strategy actually trade? Zero trades means no signals fired. Thousands means high turnover, transaction costs likely dominated.

Equity curve shape

Does it grow steadily, or does it have wild swings? A smooth curve with a Sharpe of 1.5 is more robust than a Sharpe 1.5 achieved by one lucky streak.

Sharpe interpretation

Sharpe = mean_return / std_return * √(periods_per_year)

Rough categories:

Sharpe	Interpretation
> 3.0	Suspicious: overfit, look-ahead bug, or institutional hedge fund
2.0–3.0	Exceptional if real, rare
1.0–2.0	Good, deployable if confirmed out-of-sample
0.5–1.0	Marginal, may be noise
0.0–0.5	Weak, needs work
under 0.0	Unprofitable

Drawdown interpretation

max_drawdown is the worst peak-to-trough equity decline observed in the backtest.

DD	Interpretation
under 5%	Very conservative, probably not taking enough risk
5%–10%	Tight, good for risk-sensitive capital
10%–20%	Typical for deployed systematic strategies
20%–35%	Aggressive, investors will leave
> 35%	Only survive if your alpha is huge and confirmed

Calmar ratio

Calmar = total_return (annualized) / |max_drawdown|

Higher is better. Sharpe-like but penalizes drawdowns quadratically:

Calmar	Quality
> 3	Exceptional
1.5–3	Good
0.5–1.5	Marginal
under 0.5	Poor

Not computed natively by Horizon’s BacktestResult, but you can compute it:

python

import math
years = len(result.equity_curve) / 252
cagr = (result.equity_curve[-1][1] / result.equity_curve[0][1]) ** (1 / max(years, 0.01)) - 1
calmar = cagr / abs(result.max_drawdown) if result.max_drawdown != 0 else 0
print(f"Calmar: {calmar:.2f}")

Per-trade statistics

Trade log analysis gives you a second picture of the strategy:

python

trades = result.trades
if trades:
    closes = [t for t in trades if t.realized_pnl != 0]
    wins = [t for t in closes if t.realized_pnl > 0]
    losses = [t for t in closes if t.realized_pnl < 0]

    hit_rate = len(wins) / len(closes) if closes else 0
    avg_win = sum(t.realized_pnl for t in wins) / max(len(wins), 1)
    avg_loss = sum(t.realized_pnl for t in losses) / max(len(losses), 1)
    profit_factor = abs(sum(t.realized_pnl for t in wins) / sum(t.realized_pnl for t in losses)) if losses else 0

    print(f"Hit rate:       {hit_rate:.1%}")
    print(f"Average win:    ${avg_win:+.2f}")
    print(f"Average loss:   ${avg_loss:+.2f}")
    print(f"Profit factor:  {profit_factor:.2f}")
    print(f"Total fees:     ${sum(t.fee for t in trades):.2f}")

Expected properties

Hit rate. 40-60% is normal. Below 30% or above 70% is unusual.
Average win vs average loss: winners should be bigger than losers for low-hit-rate strategies. Equal for ~50% hit rate.
Profit factor: gross wins / gross losses. > 1.5 is good, > 2.0 is great, under 1.0 is a losing strategy.

Per-strategy attribution

If you ran multiple strategies, attribute P&L by strategy_id:

python

from collections import Counter, defaultdict

pnl_by_strategy = defaultdict(float)
trade_count = Counter()

for t in result.trades:
    pnl_by_strategy[t.strategy_id] += t.realized_pnl
    trade_count[t.strategy_id] += 1

for strategy, pnl in sorted(pnl_by_strategy.items(), key=lambda x: -x[1]):
    print(f"{strategy:20s} P&L=${pnl:+.2f}  trades={trade_count[strategy]}")

Equity curve analysis

Drawdown curve

python

equity = [e for _, e in result.equity_curve]
peak = equity[0]
drawdowns = []
for eq in equity:
    peak = max(peak, eq)
    drawdowns.append((peak - eq) / peak if peak > 0 else 0)

worst_dd_idx = drawdowns.index(max(drawdowns))
worst_ts, worst_eq = result.equity_curve[worst_dd_idx]
print(f"Worst drawdown: {max(drawdowns):.2%} on {worst_ts}")
print(f"  Peak equity before: ${peak:.2f}")
print(f"  Equity at trough:   ${worst_eq:.2f}")

Recovery time

python

# Find the longest drawdown duration
longest_dd_bars = 0
current = 0
for eq in equity:
    peak = max(peak, eq)
    if eq < peak:
        current += 1
        longest_dd_bars = max(longest_dd_bars, current)
    else:
        current = 0
print(f"Longest drawdown: {longest_dd_bars} bars")

Red flags

Metrics math How Sharpe, Sortino, max DD are computed. Determinism Same seed = same results.

What to look at first

Sharpe

Max drawdown

n_trades

Equity curve shape

Sharpe interpretation

Drawdown interpretation

Calmar ratio

Per-trade statistics

Expected properties

Per-strategy attribution

Equity curve analysis

Drawdown curve

Recovery time

Red flags

Next