ML anomaly detection (v0.4)
Unsupervised Isolation Forest anomaly detection on market-state features. Complements the rule-based detectors by catching unknown patterns.
The rule-based detectors in detectors know the exact shape of each manipulation (spoofing, layering, quote-stuffing, etc.) and fire on KNOWN patterns. The ML anomaly detectors take the opposite posture: they learn what “normal” looks like, then flag statistical outliers regardless of whether the outlier matches a named pattern.
Think of it as a safety net. The rules catch the patterns you designed them for; the anomaly layer catches everything else.
v0.4 status
| Method | Status | Paper |
|---|---|---|
IsolationForestDetector | Shipped (v0.4.0) | Liu, Ting, Zhou (2008) |
AutoencoderDetector | v0.4.1 target | Dixon, Halperin, Bilokon (2020). [flow-ml] extras. |
IsolationForest fits in the base horizon[flow] install (sklearn is already there). The autoencoder waits for torch under the [flow-ml] extras.
IsolationForestDetector
Unsupervised anomaly detection via random-partition trees. For each feature vector, the forest measures the average depth needed to isolate it across many random trees; anomalies isolate quickly (shallow depth), normal points take longer.
Why this algorithm.
- Scale-invariant by design (random partitions don’t need z-scoring).
- Scales linearly in sample count.
- No distance metric or density kernel required.
- Handles the mixed-magnitude market-state features (bps, fractional returns, volatility) without manual normalization.
Feature basis
The detector scores a 5-dimensional market-state vector. A deliberately tight subset of what PolicyFeatureExtractor produces, picked so that typical market-distress events (flash moves, liquidity drought, wash pumps) move several of them together:
| Feature | What it captures |
|---|---|
spread_bps | Top-of-book spread width, a health indicator |
depth_imbalance | Book asymmetry (bid_5 - ask_5) / (bid_5 + ask_5) |
mid_return_5s | Short-horizon price drift |
mid_return_1m | Medium-horizon drift |
realized_vol_5m | Rolling log-return volatility |
Dropping time-metadata features (book snapshot age, time-of-day) and the noisier actor-scoped features keeps the forest focused on state. A flash event moves spread, return, and vol together → shallow isolation depth → strong anomaly score. A one-off weird spread doesn’t.
Lifecycle
┌──────────┐ ┌───────────────┐ ┌─────────────┐ ┌──────────┐
│ burn-in │ → │ fit │ → │ score │ → │ cooldown │
│ (collect │ │ (sklearn │ │ (each event │ │ (per │
│ N state │ │ IsolationFor │ │ after fit │ │ market, │
│ vectors)│ │ est.fit(X)) │ │ gets a │ │ 60s │
│ │ │ │ │ -decision │ │ default)│
│ │ │ │ │ score) │ │ │
└──────────┘ └───────────────┘ └─────────────┘ └──────────┘
│ ↑
└── emits finding ┘
above threshold
→ cooldown set
- Burn-in. The detector buffers the first
burn_in_eventsmarket-state samples per market. Default 500. Only scoring-eligible events (trades, fills, book snapshots) count. Quote updates are absorbed into the feature extractor but don’t advance the burn-in counter. - Fit. When the buffer hits the threshold, sklearn’s
IsolationForesttrains on it. - Score. Each subsequent event gets a score.
-decision_functionis used. Sklearn offsets so that 0 is the “predict-anomaly” boundary aligned with the contamination prior. We threshold againstscore_threshold(default 0.02. Just past the contamination boundary, so anomalies trigger without being conservative). - Cooldown. After a finding, further findings on the same market are suppressed for
score_cooldown_s(default 60s). Stops a persisting anomaly from emitting hundreds of duplicate findings. - Refit. Every
refit_every_events(default 5,000), the model is re-trained on the current buffer. Lets the definition of “normal” drift with the market’s evolving regime. Set to 0 to never refit.
Usage
from horizon.flow import make_default_engine
from horizon.flow.anomaly import IsolationForestDetector
from horizon.flow.config import FlowConfig
engine = make_default_engine(
venue_name="alpaca",
store_path="flow.db",
)
iso = IsolationForestDetector(FlowConfig())
engine.add_detector(iso)
# Now ingest events as usual. The detector burns in silently, then
# emits MarketAnomaly findings above threshold.
Optional: prefit on historical data
When you have a clean historical window, skip burn-in by pre-fitting:
from horizon.flow.policy.features import PolicyFeatureExtractor
from horizon.flow.anomaly.isolation_forest import _MARKET_FEATURES
# Build feature vectors from a recorded-feed replay
feat = PolicyFeatureExtractor()
vectors = []
for ev in recorded_feed:
feat.observe(ev)
if ev.event_kind.value in ("trade.tape", "order.filled", "book.snapshot"):
state = feat.featurize(actor_id="__mkt__", market_id=ev.market_id, now=ev.timestamp)
vectors.append([float(state.get(n, 0.0)) for n in _MARKET_FEATURES])
iso = IsolationForestDetector(FlowConfig())
iso.prefit(market_id="AAPL", feature_vectors=vectors)
# First live event gets scored immediately: no burn-in wait.
engine.add_detector(iso)
Output
Every finding is categorized AnomalyCategory.MarketAnomaly and carries:
score. The anomaly score (-decision_function). Higher = more anomalous.severity.Low(below 2× threshold),Medium(2–5× threshold),High(5×+ threshold).confidence.1 - exp(-10·score), saturating at ~0.95 around 10× threshold.evidence.features. The full 5-feature vector that drove the score. A reviewer opens this to see what state was flagged.evidence.raw_decision_score. Sklearn’s underlyingdecision_functionvalue (negative = anomalous).evidence.threshold. The threshold in effect, for reproducibility.evidence.n_burn_in_samples. How much training data the forest had.citation."Liu, Ting, Zhou 2008. Isolation Forest".
Tuning
The thresholds that matter:
burn_in_events: int = 500. More samples = cleaner “normal” but slower time-to-first-detection. For a slow venue (Polymarket), 500 is ~8 minutes of activity. Equities: 500 is 10–30 seconds. Adjust up if the market transitions regimes within the burn window.score_threshold: float = 0.02. Sklearn’s offset makes 0 the predict-anomaly boundary. 0.02 is sensitive (catches mild anomalies); raise to 0.05 for a strict production bar that only fires on clear outliers.score_cooldown_s: float = 60.0. The same event shouldn’t produce 100 findings. 60s is a good default for equity / crypto cadence; raise to 300s for slower venues.contamination: float = 0.05. Sklearn’s prior on the fraction of training data that’s anomalous. Keep around 0.05 unless you’re training on data you know contains many anomalies.refit_every_events: int = 5_000. Long-running deployments should refit periodically so the “normal” adapts. Set to 0 to pin the first fit forever.
When to prefer this vs. the rule-based detectors
ML anomaly is a complement, not a replacement.
- Use the rule-based detectors when you know what pattern you’re hunting. Spoofing, layering, wash, iceberg. They’re high-precision, citation-traceable, and their thresholds are interpretable.
- Use the anomaly detector when you want to catch whatever’s out of the ordinary. Regime shifts, novel manipulation techniques, unusual liquidity vacuums. It’s lower-specificity per finding; the evidence feature vector is what the reviewer actually reads.
In production: enable both. The rule detectors trip first on known patterns; the anomaly detector is the catch-all for the long tail.
What it does NOT do
- Not a classifier. There’s no “the market is being spoofed” output. A finding says “the state is statistically unusual”. The reviewer interprets why.
- Not deterministic across library upgrades. sklearn’s IsolationForest bootstrap sampling depends on numpy RNG; we seed with
FlowConfig.seed, but a numpy minor version bump can shift the bootstrap order. Runprefiton a committed feature matrix for bit-level reproducibility. - Not suitable for sub-second anomalies on its own. The 60s cooldown + 5-minute realized-vol feature mean the detector cares about sustained anomalies, not microbursts. For microsecond-scale detection use the rule-based quote-stuffing or spoofing detectors.
AutoencoderDetector (v0.4.1, shipped)
Dixon, Halperin, Bilokon (2020) reconstruction-error anomaly on a higher-dimensional market-state vector than IsolationForest consumes. Lives under the [flow-ml] extras (torch).
Method. A symmetric MLP encoder-decoder is fit on a burn-in window of normalized market-state feature vectors. At inference, each new vector is encoded + decoded; the squared reconstruction error is the anomaly score. Normal regimes reconstruct accurately (low error); anomalies that don’t lie on the learned manifold produce high error.
Feature basis. Wider than IsolationForest’s. 9 features including OFI at three horizons, spread, depth imbalance, multi-horizon mid returns, realized vol, and hour-of-day. IsolationForest dilutes when many features are in-distribution while one is OOD; the autoencoder’s MLP nonlinearity captures interactions between features so an anomaly that moves several of them in a correlated way produces high reconstruction error even when each feature individually looks plausible.
Usage.
from horizon.flow import make_default_engine
from horizon.flow.anomaly import AutoencoderDetector
from horizon.flow.config import FlowConfig
engine = make_default_engine(venue_name="alpaca", store_path="flow.db")
ae = AutoencoderDetector(FlowConfig())
engine.add_detector(ae)
# ... now ingest events as usual. AE runs alongside the rule-based
# detectors and IsolationForest; each emits findings under its own
# detector_name.
Config knobs (FlowConfig.detectors.autoencoder):
| Field | Default | Purpose |
|---|---|---|
burn_in_events | 500 | Scoring-eligible events before first fit. |
hidden_dim | 16 | Hidden-layer width. |
latent_dim | 4 | Bottleneck width. Smaller = stronger regularization. |
epochs | 30 | Training epochs per fit. |
batch_size | 32 | |
learning_rate | 1e-3 | Adam LR. |
score_threshold | 1.5 | MSE units on normalized features. Lower = more sensitive. |
score_cooldown_s | 60.0 | Per-market duplicate suppression. |
refit_every_events | 5,000 | Periodic refit; 0 = never. |
Output. Same AnomalyFinding shape as IsolationForest, with:
detector_name = "autoencoder"score= reconstruction MSE on normalized inputevidence.reconstruction_error= same scalarevidence.latent_dim,evidence.hidden_dim= architecture metadata
When to run both.
Run IsolationForest and the Autoencoder simultaneously. They catch different patterns:
- IsolationForest flags univariate / simple multivariate outliers. Fast, low training cost, interpretable via feature vector.
- Autoencoder flags nonlinear structural anomalies. Multi-feature regime shifts where each feature individually looks OK but their combination is off-manifold.
A production deployment runs both; a finding from either is worth a reviewer’s look. The AnomalyFinding.detector_name field distinguishes them for filtering.
Paper reference. Dixon, M., Halperin, I., Bilokon, P. (2020). Machine Learning in Finance: From Theory to Practice, Chapter 12. The chapter’s canonical autoencoder anomaly framing is what this detector implements. Symmetric MLP, MSE loss, reconstruction error threshold.
Determinism. Seeded via FlowConfig.seed through torch.manual_seed. Same input + same config → same fit. Torch’s CUDA RNG is not additionally seeded here, so GPU runs may differ slightly from CPU; pin to CPU-only for bit-level reproducibility.
Citations
- Liu, F. T., Ting, K. M., Zhou, Z.-H. (2008). “Isolation Forest.” ICDM 2008, 413-422.
- Dixon, M., Halperin, I., Bilokon, P. (2020). Machine Learning in Finance.