Walk-Forward

Rolling train/test windows with optional hyperparameter retuning

WalkForward runs rolling train/test windows across history. For each window: train / tune parameters on the train period, evaluate on the test period, advance by step. Much more robust than a single out-of-sample split because you get N independent OOS evaluations instead of one.

Import

python
from horizon.validate import WalkForward

Signature

python
WalkForward(
    train: str = "2y",
    test: str = "3m",
    step: str = "3m",
    retune_params: list[str] | None = None,
    tuner: Any = None,
    thresholds: dict[str, float] | None = None,
)
trainstr
Duration of each training window. "2y", "6m", "1w", etc.
teststr
Duration of each test window.
stepstr
How far to advance the window between iterations.
retune_paramslist[str]
Parameter names to retune on each train window. Requires a `tuner`.
tunerAny
Hyperparameter tuner (e.g., Optuna). Optional.

How it works

Each window advances by step. Test periods don’t overlap (as long as step >= test). The stitched test periods give you a continuous “as if traded live” equity curve.

Usage

python
from horizon.validate import WalkForward

wf = WalkForward(
    train="2y",
    test="3m",
    step="3m",
)

result = wf.run(
    strategy=MyStrategy,
    backtest=hz.BacktestConfig(
        start="2018-01-01",
        end="2024-12-31",
        initial_cash_usd=100_000,
    ),
    universe=my_universe,
    asset_classes=[Equity],
)

print(f"Aggregate Sharpe: {result.aggregate_sharpe:+.3f}")
print(f"Windows:")
for i, w in enumerate(result.windows):
    print(f"  {i}: {w.test_start} - {w.test_end}, Sharpe={w.sharpe:+.3f}")

# Stitched test equity curve
print(f"Worst window Sharpe: {result.worst_window_sharpe():.3f}")
print(f"Per-window Sharpes: {result.per_window_sharpe}")

Result fields

python
@dataclass
class WalkForwardResult(ValidationResult):
    windows: list[WalkForwardWindow]
    aggregate_sharpe: float
    aggregate_drawdown: float
    aggregate_cagr: float
    aggregate_equity_curve: Any
    param_evolution: dict[str, list[float]]

    @property
    def per_window_sharpe(self) -> list[float]
    @property
    def per_window_drawdown(self) -> list[float]

    def worst_window_sharpe(self) -> float

Thresholds

python
wf = WalkForward(
    train="2y", test="3m", step="3m",
    thresholds={
        "aggregate_sharpe_min": 0.8,
        "min_window_sharpe": 0.0,      # no losing windows
    },
)

With hyperparameter tuning

python
from horizon.validate import WalkForward

wf = WalkForward(
    train="2y",
    test="3m",
    step="3m",
    retune_params=["lookback", "entry_threshold"],
    # tuner=optuna_tuner,   # planned
)

On each train window, the tuner re-selects the best parameters. The test window evaluates with those re-tuned params. This is the strongest test for “is my strategy robust to parameter drift?”

Status

Why walk-forward is better than single OOS

  • Multiple independent OOS periods: if a single OOS period happens to be lucky or unlucky, walk-forward averages it out
  • Parameter drift visible: param_evolution shows how optimal params change over time; if they’re unstable, the strategy is overfit
  • Realistic simulation: mimics how you’d actually retune a live system every N months
  • Statistical power: 10 × 3m OOS periods give you 10 independent samples for significance testing

Next