Synthetic Data

SyntheticGBM and SyntheticRegimes, seeded deterministic generators

SyntheticGBM

Seeded geometric Brownian motion. Same seed → bit-identical price paths every run.

python
from horizon.data import SyntheticGBM

data = SyntheticGBM(
    market_ids=["AAPL", "MSFT", "NVDA"],
    start=datetime(2023, 1, 2),
    n_bars=252,
    mu=0.08,                       # annualized drift
    sigma=0.20,                    # annualized vol
    periods_per_year=252,
    seed=42,
    initial_price=100.0,
    step=timedelta(days=1),
)
market_idslist[str]
Market ids to generate paths for. Each gets its own deterministic path (seeded with `seed + i`).
n_barsint
Number of bars to generate.
mufloat
Annualized drift.
sigmafloat
Annualized volatility.
seedint
RNG seed. Same seed = same paths.

How it works

python
dt = 1 / periods_per_year
drift = (mu - 0.5 * sigma²) × dt
diff = sigma × √dt

for t in range(n_bars):
    z = rng.gauss(0, 1)
    price *= exp(drift + diff × z)

Classic GBM in discrete time.

Example

python
from horizon.data import SyntheticGBM

# Bullish trend
up = SyntheticGBM(["A"], n_bars=252, mu=0.30, sigma=0.15, seed=1)

# Neutral
flat = SyntheticGBM(["A"], n_bars=252, mu=0.0, sigma=0.20, seed=1)

# Bearish
down = SyntheticGBM(["A"], n_bars=252, mu=-0.20, sigma=0.25, seed=1)

SyntheticRegimes

Programmable regime shifts, good for stress testing.

python
from horizon.data import SyntheticRegimes

data = SyntheticRegimes(
    market_ids=["A"],
    n_bars=300,
    regimes=[
        (0.40, 0.20, 0.15),     # 40% uptrend
        (0.30, 0.00, 0.30),     # 30% chop
        (0.30, -0.30, 0.40),    # 30% crash
    ],
    seed=7,
)

Each tuple: (fraction_of_n_bars, mu, sigma). Fractions sum to ~1.0.

Use cases

  • Test strategy behavior across bull / neutral / bear regimes
  • Verify risk layer bounds losses in crash regimes
  • Compare strategy performance under different vol environments

Example: stress test

python
data = SyntheticRegimes(
    market_ids=["A", "B", "C"],
    n_bars=500,
    regimes=[
        (0.20, 0.30, 0.15),   # 100 bars up
        (0.60, -0.40, 0.35),  # 300 bars crash
        (0.20, 0.10, 0.20),   # 100 bars recovery
    ],
    seed=42,
)

result = hz.run(
    mode="backtest",
    data_source=data,
    strategies=[...],
    risk=RiskProfile.conservative(),   # tight risk
    ...
)

# Verify that the crash was bounded by risk layers
assert result.max_drawdown < 0.30   # conservative profile should keep DD < 30%

Tests

python
def test_deterministic_with_seed():
    s1 = SyntheticGBM(["A"], n_bars=50, seed=1)
    s2 = SyntheticGBM(["A"], n_bars=50, seed=1)
    bars1 = list(s1.iter_bars())
    bars2 = list(s2.iter_bars())
    assert [b.price for b in bars1] == [b.price for b in bars2]

def test_crash_regime_declines():
    s = SyntheticRegimes(
        market_ids=["A"],
        n_bars=300,
        regimes=[(1.0, -0.50, 0.30)],
        seed=42,
    )
    bars = list(s.iter_bars())
    assert bars[-1].price < bars[0].price

Next