Walk-Forward
Rolling train/test windows with optional hyperparameter retuning
WalkForward runs rolling train/test windows across history. For each window: train / tune parameters on the train period, evaluate on the test period, advance by step. Much more robust than a single out-of-sample split because you get N independent OOS evaluations instead of one.
Import
python
from horizon.validate import WalkForward
Signature
python
WalkForward(
train: str = "2y",
test: str = "3m",
step: str = "3m",
retune_params: list[str] | None = None,
tuner: Any = None,
thresholds: dict[str, float] | None = None,
)
trainstrDuration of each training window. "2y", "6m", "1w", etc.
teststrDuration of each test window.
stepstrHow far to advance the window between iterations.
retune_paramslist[str]Parameter names to retune on each train window. Requires a `tuner`.
tunerAnyHyperparameter tuner (e.g., Optuna). Optional.
How it works
Each window advances by step. Test periods don’t overlap (as long as step >= test). The stitched test periods give you a continuous “as if traded live” equity curve.
Usage
python
from horizon.validate import WalkForward
wf = WalkForward(
train="2y",
test="3m",
step="3m",
)
result = wf.run(
strategy=MyStrategy,
backtest=hz.BacktestConfig(
start="2018-01-01",
end="2024-12-31",
initial_cash_usd=100_000,
),
universe=my_universe,
asset_classes=[Equity],
)
print(f"Aggregate Sharpe: {result.aggregate_sharpe:+.3f}")
print(f"Windows:")
for i, w in enumerate(result.windows):
print(f" {i}: {w.test_start} - {w.test_end}, Sharpe={w.sharpe:+.3f}")
# Stitched test equity curve
print(f"Worst window Sharpe: {result.worst_window_sharpe():.3f}")
print(f"Per-window Sharpes: {result.per_window_sharpe}")
Result fields
python
@dataclass
class WalkForwardResult(ValidationResult):
windows: list[WalkForwardWindow]
aggregate_sharpe: float
aggregate_drawdown: float
aggregate_cagr: float
aggregate_equity_curve: Any
param_evolution: dict[str, list[float]]
@property
def per_window_sharpe(self) -> list[float]
@property
def per_window_drawdown(self) -> list[float]
def worst_window_sharpe(self) -> float
Thresholds
python
wf = WalkForward(
train="2y", test="3m", step="3m",
thresholds={
"aggregate_sharpe_min": 0.8,
"min_window_sharpe": 0.0, # no losing windows
},
)
With hyperparameter tuning
python
from horizon.validate import WalkForward
wf = WalkForward(
train="2y",
test="3m",
step="3m",
retune_params=["lookback", "entry_threshold"],
# tuner=optuna_tuner, # planned
)
On each train window, the tuner re-selects the best parameters. The test window evaluates with those re-tuned params. This is the strongest test for “is my strategy robust to parameter drift?”
Status
Why walk-forward is better than single OOS
- Multiple independent OOS periods: if a single OOS period happens to be lucky or unlucky, walk-forward averages it out
- Parameter drift visible:
param_evolutionshows how optimal params change over time; if they’re unstable, the strategy is overfit - Realistic simulation: mimics how you’d actually retune a live system every N months
- Statistical power: 10 × 3m OOS periods give you 10 independent samples for significance testing