Thresholds
User-driven pass/fail criteria for validation tests
Every validation test accepts an optional thresholds dict. When you provide it, the result gains a .passed boolean and a threshold_checks dict showing which thresholds were met.
Horizon doesn’t have an opinion about what’s acceptable. You declare the criteria; the framework just does the math.
The contract
python
class ValidationTest(ABC):
def __init__(self, thresholds: dict[str, float] | None = None):
self.thresholds = thresholds or {}
Pass thresholds at construction:
python
from horizon.validate import Bootstrap
bs = Bootstrap(
metrics=["sharpe"],
n_samples=1000,
thresholds={
"sharpe_ci_lo_min": 0.5,
"sharpe_median_min": 1.0,
},
)
result = bs.run(returns=my_returns)
if result.passed:
print("All thresholds met")
Threshold key naming
Each test documents its supported threshold keys. The general pattern is {metric}_{criterion}_{bound}:
sharpe_ci_lo_min: lower CI bound of Sharpe must be at least this valuesharpe_median_min: bootstrap median of Sharpe must be at least this valueoos_sharpe_min: OOS Sharpe must be at least this valueis_oos_sharpe_ratio_max: IS/OOS Sharpe ratio must not exceed this valueaggregate_sharpe_min: stitched walk-forward Sharpe must be at least this valuemin_window_sharpe: worst walk-forward window Sharpe must be at least this value
Result fields
python
@dataclass
class ValidationResult:
test_name: str
thresholds: dict[str, float] # user-supplied thresholds
threshold_checks: dict[str, bool] # per-threshold pass/fail
metadata: dict[str, Any]
@property
def passed(self) -> bool:
"""True iff all user thresholds are met."""
Inspecting the checks
python
result = bs.run(returns=my_returns)
for check_name, passed in result.threshold_checks.items():
icon = "✓" if passed else "✗"
print(f"{icon} {check_name}")
Combining multiple tests
python
from horizon.validate import Bootstrap, WalkForward
# Run two tests with different thresholds
bs = Bootstrap(
metrics=["sharpe"],
thresholds={"sharpe_ci_lo_min": 0.5},
).run(returns=my_returns)
wf = WalkForward(
train="2y", test="3m",
thresholds={"aggregate_sharpe_min": 0.8, "min_window_sharpe": 0.0},
).run(strategy=MyStrategy, backtest=my_bt, ...)
# User's own gate
deploy_worthy = bs.passed and wf.passed
if deploy_worthy:
print("All validation passed, safe to promote")
else:
print("Validation failed, need more work")
Why user-driven thresholds?
Because “is a Sharpe of 1.2 good enough?” depends on:
- Your capital base
- Your investor expectations
- The asset class you’re trading
- How many strategy variants you tested
- Your confidence in the backtest assumptions
A framework that has an opinion about this is either too strict (blocks everything) or too lax (lets overfits through). Horizon returns the numbers; you decide the meaning.