Reading Results
How to interpret BacktestResult fields
What to look at first
Sharpe
Is it positive? A negative Sharpe means the strategy lost money on average. Below 0.5 is weak, 0.5-1.0 is OK, 1.0-2.0 is good. Above 2.0 is suspicious. check for overfit.
Max drawdown
How much did the strategy lose from peak to trough? This is the worst historical experience. Double it to get a realistic estimate of the worst you'd live through in the future.
n_trades
Did the strategy actually trade? Zero trades means no signals fired. Thousands means high turnover, transaction costs likely dominated.
Equity curve shape
Does it grow steadily, or does it have wild swings? A smooth curve with a Sharpe of 1.5 is more robust than a Sharpe 1.5 achieved by one lucky streak.
Sharpe interpretation
Sharpe = mean_return / std_return * √(periods_per_year)
Rough categories:
| Sharpe | Interpretation |
|---|---|
| > 3.0 | Suspicious: overfit, look-ahead bug, or institutional hedge fund |
| 2.0–3.0 | Exceptional if real, rare |
| 1.0–2.0 | Good, deployable if confirmed out-of-sample |
| 0.5–1.0 | Marginal, may be noise |
| 0.0–0.5 | Weak, needs work |
| under 0.0 | Unprofitable |
Drawdown interpretation
max_drawdown is the worst peak-to-trough equity decline observed in the backtest.
| DD | Interpretation |
|---|---|
| under 5% | Very conservative, probably not taking enough risk |
| 5%–10% | Tight, good for risk-sensitive capital |
| 10%–20% | Typical for deployed systematic strategies |
| 20%–35% | Aggressive, investors will leave |
| > 35% | Only survive if your alpha is huge and confirmed |
Calmar ratio
Calmar = total_return (annualized) / |max_drawdown|
Higher is better. Sharpe-like but penalizes drawdowns quadratically:
| Calmar | Quality |
|---|---|
| > 3 | Exceptional |
| 1.5–3 | Good |
| 0.5–1.5 | Marginal |
| under 0.5 | Poor |
Not computed natively by Horizon’s BacktestResult, but you can compute it:
python
import math
years = len(result.equity_curve) / 252
cagr = (result.equity_curve[-1][1] / result.equity_curve[0][1]) ** (1 / max(years, 0.01)) - 1
calmar = cagr / abs(result.max_drawdown) if result.max_drawdown != 0 else 0
print(f"Calmar: {calmar:.2f}")
Per-trade statistics
Trade log analysis gives you a second picture of the strategy:
python
trades = result.trades
if trades:
closes = [t for t in trades if t.realized_pnl != 0]
wins = [t for t in closes if t.realized_pnl > 0]
losses = [t for t in closes if t.realized_pnl < 0]
hit_rate = len(wins) / len(closes) if closes else 0
avg_win = sum(t.realized_pnl for t in wins) / max(len(wins), 1)
avg_loss = sum(t.realized_pnl for t in losses) / max(len(losses), 1)
profit_factor = abs(sum(t.realized_pnl for t in wins) / sum(t.realized_pnl for t in losses)) if losses else 0
print(f"Hit rate: {hit_rate:.1%}")
print(f"Average win: ${avg_win:+.2f}")
print(f"Average loss: ${avg_loss:+.2f}")
print(f"Profit factor: {profit_factor:.2f}")
print(f"Total fees: ${sum(t.fee for t in trades):.2f}")
Expected properties
- Hit rate. 40-60% is normal. Below 30% or above 70% is unusual.
- Average win vs average loss: winners should be bigger than losers for low-hit-rate strategies. Equal for ~50% hit rate.
- Profit factor: gross wins / gross losses. > 1.5 is good, > 2.0 is great, under 1.0 is a losing strategy.
Per-strategy attribution
If you ran multiple strategies, attribute P&L by strategy_id:
python
from collections import Counter, defaultdict
pnl_by_strategy = defaultdict(float)
trade_count = Counter()
for t in result.trades:
pnl_by_strategy[t.strategy_id] += t.realized_pnl
trade_count[t.strategy_id] += 1
for strategy, pnl in sorted(pnl_by_strategy.items(), key=lambda x: -x[1]):
print(f"{strategy:20s} P&L=${pnl:+.2f} trades={trade_count[strategy]}")
Equity curve analysis
Drawdown curve
python
equity = [e for _, e in result.equity_curve]
peak = equity[0]
drawdowns = []
for eq in equity:
peak = max(peak, eq)
drawdowns.append((peak - eq) / peak if peak > 0 else 0)
worst_dd_idx = drawdowns.index(max(drawdowns))
worst_ts, worst_eq = result.equity_curve[worst_dd_idx]
print(f"Worst drawdown: {max(drawdowns):.2%} on {worst_ts}")
print(f" Peak equity before: ${peak:.2f}")
print(f" Equity at trough: ${worst_eq:.2f}")
Recovery time
python
# Find the longest drawdown duration
longest_dd_bars = 0
current = 0
for eq in equity:
peak = max(peak, eq)
if eq < peak:
current += 1
longest_dd_bars = max(longest_dd_bars, current)
else:
current = 0
print(f"Longest drawdown: {longest_dd_bars} bars")