ML anomaly detection (v0.4)

Unsupervised Isolation Forest anomaly detection on market-state features. Complements the rule-based detectors by catching unknown patterns.

The rule-based detectors in detectors know the exact shape of each manipulation (spoofing, layering, quote-stuffing, etc.) and fire on KNOWN patterns. The ML anomaly detectors take the opposite posture: they learn what “normal” looks like, then flag statistical outliers regardless of whether the outlier matches a named pattern.

Think of it as a safety net. The rules catch the patterns you designed them for; the anomaly layer catches everything else.

v0.4 status

Method	Status	Paper
`IsolationForestDetector`	Shipped (v0.4.0)	Liu, Ting, Zhou (2008)
`AutoencoderDetector`	v0.4.1 target	Dixon, Halperin, Bilokon (2020). `[flow-ml]` extras.

IsolationForest fits in the base horizon[flow] install (sklearn is already there). The autoencoder waits for torch under the [flow-ml] extras.

IsolationForestDetector

Unsupervised anomaly detection via random-partition trees. For each feature vector, the forest measures the average depth needed to isolate it across many random trees; anomalies isolate quickly (shallow depth), normal points take longer.

Why this algorithm.

Scale-invariant by design (random partitions don’t need z-scoring).
Scales linearly in sample count.
No distance metric or density kernel required.
Handles the mixed-magnitude market-state features (bps, fractional returns, volatility) without manual normalization.

Feature basis

The detector scores a 5-dimensional market-state vector. A deliberately tight subset of what PolicyFeatureExtractor produces, picked so that typical market-distress events (flash moves, liquidity drought, wash pumps) move several of them together:

Feature	What it captures
`spread_bps`	Top-of-book spread width, a health indicator
`depth_imbalance`	Book asymmetry `(bid_5 - ask_5) / (bid_5 + ask_5)`
`mid_return_5s`	Short-horizon price drift
`mid_return_1m`	Medium-horizon drift
`realized_vol_5m`	Rolling log-return volatility

Dropping time-metadata features (book snapshot age, time-of-day) and the noisier actor-scoped features keeps the forest focused on state. A flash event moves spread, return, and vol together → shallow isolation depth → strong anomaly score. A one-off weird spread doesn’t.

Lifecycle

text

┌──────────┐ ┌───────────────┐ ┌─────────────┐ ┌──────────┐
│ burn-in │ → │ fit │ → │ score │ → │ cooldown │
│ (collect │ │ (sklearn │ │ (each event │ │ (per │
│ N state │ │ IsolationFor │ │ after fit │ │ market, │
│ vectors)│ │ est.fit(X)) │ │ gets a │ │ 60s │
│ │ │ │ │ -decision │ │ default)│
│ │ │ │ │ score) │ │ │
└──────────┘ └───────────────┘ └─────────────┘ └──────────┘
 │ ↑
 └── emits finding ┘
 above threshold
 → cooldown set

Burn-in. The detector buffers the first burn_in_events market-state samples per market. Default 500. Only scoring-eligible events (trades, fills, book snapshots) count. Quote updates are absorbed into the feature extractor but don’t advance the burn-in counter.
Fit. When the buffer hits the threshold, sklearn’s IsolationForest trains on it.
Score. Each subsequent event gets a score. -decision_function is used. Sklearn offsets so that 0 is the “predict-anomaly” boundary aligned with the contamination prior. We threshold against score_threshold (default 0.02. Just past the contamination boundary, so anomalies trigger without being conservative).
Cooldown. After a finding, further findings on the same market are suppressed for score_cooldown_s (default 60s). Stops a persisting anomaly from emitting hundreds of duplicate findings.
Refit. Every refit_every_events (default 5,000), the model is re-trained on the current buffer. Lets the definition of “normal” drift with the market’s evolving regime. Set to 0 to never refit.

Usage

python

from horizon.flow import make_default_engine
from horizon.flow.anomaly import IsolationForestDetector
from horizon.flow.config import FlowConfig

engine = make_default_engine(
 venue_name="alpaca",
 store_path="flow.db",
)
iso = IsolationForestDetector(FlowConfig())
engine.add_detector(iso)

# Now ingest events as usual. The detector burns in silently, then
# emits MarketAnomaly findings above threshold.

Optional: prefit on historical data

When you have a clean historical window, skip burn-in by pre-fitting:

python

from horizon.flow.policy.features import PolicyFeatureExtractor
from horizon.flow.anomaly.isolation_forest import _MARKET_FEATURES

# Build feature vectors from a recorded-feed replay
feat = PolicyFeatureExtractor()
vectors = []
for ev in recorded_feed:
 feat.observe(ev)
 if ev.event_kind.value in ("trade.tape", "order.filled", "book.snapshot"):
 state = feat.featurize(actor_id="__mkt__", market_id=ev.market_id, now=ev.timestamp)
 vectors.append([float(state.get(n, 0.0)) for n in _MARKET_FEATURES])

iso = IsolationForestDetector(FlowConfig())
iso.prefit(market_id="AAPL", feature_vectors=vectors)

# First live event gets scored immediately: no burn-in wait.
engine.add_detector(iso)

Output

Every finding is categorized AnomalyCategory.MarketAnomaly and carries:

score. The anomaly score (-decision_function). Higher = more anomalous.
severity. Low (below 2× threshold), Medium (2–5× threshold), High (5×+ threshold).
confidence. 1 - exp(-10·score), saturating at ~0.95 around 10× threshold.
evidence.features. The full 5-feature vector that drove the score. A reviewer opens this to see what state was flagged.
evidence.raw_decision_score. Sklearn’s underlying decision_function value (negative = anomalous).
evidence.threshold. The threshold in effect, for reproducibility.
evidence.n_burn_in_samples. How much training data the forest had.
citation. "Liu, Ting, Zhou 2008. Isolation Forest".

Tuning

The thresholds that matter:

burn_in_events: int = 500. More samples = cleaner “normal” but slower time-to-first-detection. For a slow venue (Polymarket), 500 is ~8 minutes of activity. Equities: 500 is 10–30 seconds. Adjust up if the market transitions regimes within the burn window.
score_threshold: float = 0.02. Sklearn’s offset makes 0 the predict-anomaly boundary. 0.02 is sensitive (catches mild anomalies); raise to 0.05 for a strict production bar that only fires on clear outliers.
score_cooldown_s: float = 60.0. The same event shouldn’t produce 100 findings. 60s is a good default for equity / crypto cadence; raise to 300s for slower venues.
contamination: float = 0.05. Sklearn’s prior on the fraction of training data that’s anomalous. Keep around 0.05 unless you’re training on data you know contains many anomalies.
refit_every_events: int = 5_000. Long-running deployments should refit periodically so the “normal” adapts. Set to 0 to pin the first fit forever.

When to prefer this vs. the rule-based detectors

ML anomaly is a complement, not a replacement.

Use the rule-based detectors when you know what pattern you’re hunting. Spoofing, layering, wash, iceberg. They’re high-precision, citation-traceable, and their thresholds are interpretable.
Use the anomaly detector when you want to catch whatever’s out of the ordinary. Regime shifts, novel manipulation techniques, unusual liquidity vacuums. It’s lower-specificity per finding; the evidence feature vector is what the reviewer actually reads.

In production: enable both. The rule detectors trip first on known patterns; the anomaly detector is the catch-all for the long tail.

What it does NOT do

Not a classifier. There’s no “the market is being spoofed” output. A finding says “the state is statistically unusual”. The reviewer interprets why.
Not deterministic across library upgrades. sklearn’s IsolationForest bootstrap sampling depends on numpy RNG; we seed with FlowConfig.seed, but a numpy minor version bump can shift the bootstrap order. Run prefit on a committed feature matrix for bit-level reproducibility.
Not suitable for sub-second anomalies on its own. The 60s cooldown + 5-minute realized-vol feature mean the detector cares about sustained anomalies, not microbursts. For microsecond-scale detection use the rule-based quote-stuffing or spoofing detectors.

AutoencoderDetector (v0.4.1, shipped)

Dixon, Halperin, Bilokon (2020) reconstruction-error anomaly on a higher-dimensional market-state vector than IsolationForest consumes. Lives under the [flow-ml] extras (torch).

Method. A symmetric MLP encoder-decoder is fit on a burn-in window of normalized market-state feature vectors. At inference, each new vector is encoded + decoded; the squared reconstruction error is the anomaly score. Normal regimes reconstruct accurately (low error); anomalies that don’t lie on the learned manifold produce high error.

Feature basis. Wider than IsolationForest’s. 9 features including OFI at three horizons, spread, depth imbalance, multi-horizon mid returns, realized vol, and hour-of-day. IsolationForest dilutes when many features are in-distribution while one is OOD; the autoencoder’s MLP nonlinearity captures interactions between features so an anomaly that moves several of them in a correlated way produces high reconstruction error even when each feature individually looks plausible.

Usage.

python

from horizon.flow import make_default_engine
from horizon.flow.anomaly import AutoencoderDetector
from horizon.flow.config import FlowConfig

engine = make_default_engine(venue_name="alpaca", store_path="flow.db")
ae = AutoencoderDetector(FlowConfig())
engine.add_detector(ae)

# ... now ingest events as usual. AE runs alongside the rule-based
# detectors and IsolationForest; each emits findings under its own
# detector_name.

Config knobs (FlowConfig.detectors.autoencoder):

Field	Default	Purpose
`burn_in_events`	500	Scoring-eligible events before first fit.
`hidden_dim`	16	Hidden-layer width.
`latent_dim`	4	Bottleneck width. Smaller = stronger regularization.
`epochs`	30	Training epochs per fit.
`batch_size`	32
`learning_rate`	1e-3	Adam LR.
`score_threshold`	1.5	MSE units on normalized features. Lower = more sensitive.
`score_cooldown_s`	60.0	Per-market duplicate suppression.
`refit_every_events`	5,000	Periodic refit; 0 = never.

Output. Same AnomalyFinding shape as IsolationForest, with:

detector_name = "autoencoder"
score = reconstruction MSE on normalized input
evidence.reconstruction_error = same scalar
evidence.latent_dim, evidence.hidden_dim = architecture metadata

When to run both.

Run IsolationForest and the Autoencoder simultaneously. They catch different patterns:

IsolationForest flags univariate / simple multivariate outliers. Fast, low training cost, interpretable via feature vector.
Autoencoder flags nonlinear structural anomalies. Multi-feature regime shifts where each feature individually looks OK but their combination is off-manifold.

A production deployment runs both; a finding from either is worth a reviewer’s look. The AnomalyFinding.detector_name field distinguishes them for filtering.

Paper reference. Dixon, M., Halperin, I., Bilokon, P. (2020). Machine Learning in Finance: From Theory to Practice, Chapter 12. The chapter’s canonical autoencoder anomaly framing is what this detector implements. Symmetric MLP, MSE loss, reconstruction error threshold.

Determinism. Seeded via FlowConfig.seed through torch.manual_seed. Same input + same config → same fit. Torch’s CUDA RNG is not additionally seeded here, so GPU runs may differ slightly from CPU; pin to CPU-only for bit-level reproducibility.

Citations

Liu, F. T., Ting, K. M., Zhou, Z.-H. (2008). “Isolation Forest.” ICDM 2008, 413-422.
Dixon, M., Halperin, I., Bilokon, P. (2020). Machine Learning in Finance.