Reading Results

How to interpret BacktestResult fields

What to look at first

Sharpe

Is it positive? A negative Sharpe means the strategy lost money on average. Below 0.5 is weak, 0.5-1.0 is OK, 1.0-2.0 is good. Above 2.0 is suspicious. check for overfit.

Max drawdown

How much did the strategy lose from peak to trough? This is the worst historical experience. Double it to get a realistic estimate of the worst you'd live through in the future.

n_trades

Did the strategy actually trade? Zero trades means no signals fired. Thousands means high turnover, transaction costs likely dominated.

Equity curve shape

Does it grow steadily, or does it have wild swings? A smooth curve with a Sharpe of 1.5 is more robust than a Sharpe 1.5 achieved by one lucky streak.

Sharpe interpretation

Sharpe = mean_return / std_return * √(periods_per_year)

Rough categories:

SharpeInterpretation
> 3.0Suspicious: overfit, look-ahead bug, or institutional hedge fund
2.0–3.0Exceptional if real, rare
1.0–2.0Good, deployable if confirmed out-of-sample
0.5–1.0Marginal, may be noise
0.0–0.5Weak, needs work
under 0.0Unprofitable

Drawdown interpretation

max_drawdown is the worst peak-to-trough equity decline observed in the backtest.

DDInterpretation
under 5%Very conservative, probably not taking enough risk
5%–10%Tight, good for risk-sensitive capital
10%–20%Typical for deployed systematic strategies
20%–35%Aggressive, investors will leave
> 35%Only survive if your alpha is huge and confirmed

Calmar ratio

Calmar = total_return (annualized) / |max_drawdown|

Higher is better. Sharpe-like but penalizes drawdowns quadratically:

CalmarQuality
> 3Exceptional
1.5–3Good
0.5–1.5Marginal
under 0.5Poor

Not computed natively by Horizon’s BacktestResult, but you can compute it:

python
import math
years = len(result.equity_curve) / 252
cagr = (result.equity_curve[-1][1] / result.equity_curve[0][1]) ** (1 / max(years, 0.01)) - 1
calmar = cagr / abs(result.max_drawdown) if result.max_drawdown != 0 else 0
print(f"Calmar: {calmar:.2f}")

Per-trade statistics

Trade log analysis gives you a second picture of the strategy:

python
trades = result.trades
if trades:
    closes = [t for t in trades if t.realized_pnl != 0]
    wins = [t for t in closes if t.realized_pnl > 0]
    losses = [t for t in closes if t.realized_pnl < 0]

    hit_rate = len(wins) / len(closes) if closes else 0
    avg_win = sum(t.realized_pnl for t in wins) / max(len(wins), 1)
    avg_loss = sum(t.realized_pnl for t in losses) / max(len(losses), 1)
    profit_factor = abs(sum(t.realized_pnl for t in wins) / sum(t.realized_pnl for t in losses)) if losses else 0

    print(f"Hit rate:       {hit_rate:.1%}")
    print(f"Average win:    ${avg_win:+.2f}")
    print(f"Average loss:   ${avg_loss:+.2f}")
    print(f"Profit factor:  {profit_factor:.2f}")
    print(f"Total fees:     ${sum(t.fee for t in trades):.2f}")

Expected properties

  • Hit rate. 40-60% is normal. Below 30% or above 70% is unusual.
  • Average win vs average loss: winners should be bigger than losers for low-hit-rate strategies. Equal for ~50% hit rate.
  • Profit factor: gross wins / gross losses. > 1.5 is good, > 2.0 is great, under 1.0 is a losing strategy.

Per-strategy attribution

If you ran multiple strategies, attribute P&L by strategy_id:

python
from collections import Counter, defaultdict

pnl_by_strategy = defaultdict(float)
trade_count = Counter()

for t in result.trades:
    pnl_by_strategy[t.strategy_id] += t.realized_pnl
    trade_count[t.strategy_id] += 1

for strategy, pnl in sorted(pnl_by_strategy.items(), key=lambda x: -x[1]):
    print(f"{strategy:20s} P&L=${pnl:+.2f}  trades={trade_count[strategy]}")

Equity curve analysis

Drawdown curve

python
equity = [e for _, e in result.equity_curve]
peak = equity[0]
drawdowns = []
for eq in equity:
    peak = max(peak, eq)
    drawdowns.append((peak - eq) / peak if peak > 0 else 0)

worst_dd_idx = drawdowns.index(max(drawdowns))
worst_ts, worst_eq = result.equity_curve[worst_dd_idx]
print(f"Worst drawdown: {max(drawdowns):.2%} on {worst_ts}")
print(f"  Peak equity before: ${peak:.2f}")
print(f"  Equity at trough:   ${worst_eq:.2f}")

Recovery time

python
# Find the longest drawdown duration
longest_dd_bars = 0
current = 0
for eq in equity:
    peak = max(peak, eq)
    if eq < peak:
        current += 1
        longest_dd_bars = max(longest_dd_bars, current)
    else:
        current = 0
print(f"Longest drawdown: {longest_dd_bars} bars")

Red flags

Next