Institutional readiness
What guarantees the flow module gives you and how to verify them yourself before deploying.
This page covers the properties a fund-grade deployment of horizon.flow depends on: correctness, determinism, compliance traceability, robustness, and performance. For each property, you get the guarantee the module makes and the exact command that verifies it on your machine.
If you are evaluating whether to deploy, run the commands at the end and read the output. The module is designed so every claim on this page is reproducible.
1. Correctness: every method is peer-reviewed
Every detector, classifier, and feature extractor cites a paper or regulatory source. The full citation map:
| Subsystem | Method | Citation |
|---|---|---|
| Taxonomy | 6-category soft-label | Kirilenko, Kyle, Samadi, Tuzun (2017), JoF |
| Toxicity, VPIN | volume-bucket with sensitivity | Easley, López de Prado, O’Hara (2012); Andersen, Bondarenko (2014) critique |
| Toxicity, OFI | signed order flow imbalance | Cont, Kukanov, Stoikov (2014), JFE |
| Toxicity, PIN | probability of informed trading | Easley, Kiefer, O’Hara, Paperman (1996) |
| Toxicity, Hawkes | multivariate self-excitation, branching ratio | Bacry, Jaimungal, Muzy (2015); Filimonov, Sornette (2012) |
| Spoofing | imbalance-triggered plus fast-cancel | Lee, Eom, Park (2013) |
| Layering | stacked orders, common actor, bulk cancel | FINRA Rule 5210, CFTC spoofing guidance |
| Quote stuffing | message-rate spike with low fill | Egginton, Van Ness, Van Ness (2016) |
| Wash trade | on-chain same-origin plus Benford deviation | Cong, Li, Tang, Yang (2023), Management Science |
| Iceberg | depth persistence despite visible fills | Hautsch, Huang (2012); Esser, Mönch (2007) |
| Wallet clustering | common-input-ownership plus deposit reuse | Meiklejohn et al. (2013); Victor (2020) |
| Behavioral cluster | density-based, noise-labeled | HDBSCAN (Campello et al. 2013); Tumminello et al. (2012) |
| Network cluster | modularity maximization | Louvain (Blondel et al. 2008) |
| Temporal cluster | shape-based k-means | k-Shape (Paparrizos, Gravano 2015) |
| Shadow policy | DT, GBDT, SHAP | Pomerleau (1988); Chen, Guestrin (2016); Lundberg, Lee (2017) |
| IRL | linear-reward MaxEnt | Ziebart, Maas, Bagnell, Dey (2008) |
| GAIL, AIRL | offline discriminator variants | Ho, Ermon (2016); Fu, Luo, Levine (2018) |
| ML anomaly | Isolation Forest plus LOB autoencoder | Liu, Ting, Zhou (2008); Dixon, Halperin, Bilokon (2020) |
Per-detector citations and thresholds live in detectors, toxicity, and policy.
How you verify it
Run the paper-reproduction suite:
pip install 'horizon[flow,flow-irl,flow-ml]'
pytest tests/flow/test_paper_reproduction.py -v
You should see tests reproducing:
- VPIN producing a monotonic toxicity curve on an informed-trading regime.
- Kirilenko taxonomy recovering the right labels on a 6-archetype synthetic market at 90% or better.
- Spoofing heuristic catching injected imbalance-triggered cancels.
- MaxEnt IRL recovering a known reward vector on a gridworld benchmark.
2. Correctness: end-to-end behavioral audits
Two runnable scripts exercise the full module end to end. Both print a PASS / FAIL table per layer and exit non-zero on any failure.
Wallet-exposing venues (Polymarket, Hyperliquid)
python examples/flow_behavioral_audit.py
Covers 21 layers: ingestion, actor profiling, taxonomy classification, all six manipulation detectors, wallet clustering, shadow policy fitting, audit-chain integrity, and store WORM enforcement. Expected output ends with ALL LAYERS PASS.
Anonymous-tape venues (equities, options, most perps)
python examples/flow_anon_behavioral_audit.py
Covers 13 layers exercising graceful degradation: when a venue does not expose per-order wallets, the module aggregates into anon_{market}_{window} pseudo-actors and still runs every market-level detector (VPIN, OFI, iceberg, quote-stuffing). Expected output ends with ALL LAYERS PASS.
Both scripts are reproducible and suitable for CI gates.
3. Determinism: same input, same output
Every stochastic component takes an explicit seed, propagated from FlowConfig:
- Actor profiling samples features with
cfg.seed. - Clustering (HDBSCAN, k-Shape, Louvain) uses
random_state=cfg.seed. - IRL, GAIL, AIRL set
torch.manual_seedandnp.random.seedbefore training. - Shadow-policy sklearn estimators take
random_state=cfg.policy.rng_seed. - Machine-learning anomaly detectors (Isolation Forest, autoencoder) use the same pattern.
How you verify it
pytest tests/flow/test_institutional_validation.py -v -k determinism
The test generates 1000 events with a fixed seed, runs them through the engine twice, and asserts both runs produce bit-identical findings and actor profiles.
For your own regression tests, horizon flow replay reads a recorded event stream and re-runs it through a fresh engine. Byte-deterministic given the same config and seed. See CLI.
4. Compliance traceability
Every finding lives in two places at once: a hash-chained audit log and a queryable flow store.
Hash-chained audit log
Findings emit through five append-only AuditCategory members (FlowAnomaly, ActorProfiled, ClusterAssigned, PolicyInferred, BotDetected). This is the same hash chain used for order and execution records. AuditChain.verify() detects any post-hoc modification.
Usage:
from horizon.audit import AuditChain, SQLiteSink
sink = SQLiteSink("audit.db")
chain = AuditChain(sink)
result = chain.verify()
assert result.ok, f"chain broken at seq {result.first_gap_seq}"
Append-only flow store
SQLiteFlowStore installs SQLite triggers that reject any UPDATE or DELETE against the anomalies table. The store supports append and read, never modify. See store for the schema.
Every finding is traceable
Each finding carries:
- A machine-readable
category(spoofing, layering, wash_trade, and so on). - A human-readable
messagefor reports. - The underlying
MarketEventIDs that triggered it, so a reviewer can trace a finding back to raw tape. - The config commit or checksum active when the finding was produced, so thresholds in effect at the time are recoverable.
How you verify it
pytest tests/flow/test_institutional_validation.py -v -k "chain or worm or audit"
This runs: hash-chain integrity under a 1000-finding load, WORM trigger attack surface (direct UPDATE, DELETE by PK, mass DELETE, nested transactions are all rejected), and end-to-end audit-log completeness (every store record has a matching hash-chained audit event).
5. Robustness
Malformed input tolerance
The engine accepts NaN and Inf fields, missing attributes, out-of-range values, and random byte strings in actor IDs without crashing the host process. Bad events are either ignored or raise a typed exception the caller can catch.
Feed degradation
Backward-jumping timestamps, gaps longer than the rolling-window size, and duplicate events are all tolerated. The engine records the gap through the existing LiveFeed.on_gap hook but does not abort.
Concurrency
A single threading.Lock serializes the full ingest path. Parallel observers (audit subscriber, live feed handler, replay source) cannot interleave state updates.
How you verify it
pytest tests/flow/test_institutional_validation.py -v -k "robust or feed"
6. Performance
Typical numbers on a single core, M-series laptop (run the benchmark on your own hardware for your numbers):
| Metric | Value |
|---|---|
| Ingest throughput | ~21,000 events/sec (about 48 μs per event) |
| Per-detector latency | 0.5 to 1.0 μs each |
| Flow store write rate | ~12,000 findings/sec |
| ML training (500 events) | ~140 ms for Isolation Forest or autoencoder |
| Memory growth | under 10 MB over 30,000 events (bounded deques) |
Headroom for v0.1 target rates (low thousands of events per second peak across Polymarket, Kalshi, Hyperliquid) is roughly 10x.
How you measure it on your hardware
python examples/flow_performance_benchmark.py
The script prints a table of all the above metrics.
7. Interface stability
Pinned public surface:
horizon.flow.api:actor_profile,anomalies,cluster_of,shadow_policy.horizon.flow.events:MarketEvent,AnomalyFinding,ActorProfile,WalletCluster,PolicyModel, plus the enums.horizon.flow.config.FlowConfig: pinned; additions are new optional fields.horizon.flow.store.SQLiteFlowStore: pinned constructor and methods.horizon.flow.risk_integration.FlowAnomalyCheck: pinned constructor andcheckmethod.
Breaking changes to any of the above require a major version bump and a migration note in the roadmap. Additive changes ship in minor versions.
8. Retention
The module does not auto-delete anything. Two artifacts need a retention policy:
SQLiteFlowStore(the flow store). Append-only findings, cluster assignments, policy models. For 17a-4-equivalent WORM retention, back up the SQLite file and its WAL to immutable storage after each run.AuditLog(the hash chain). Same retention as the rest of the SDK’s audit log; see professionals/retention.
On-chain data is never stored beyond what you ingest. Wallet addresses are pseudonymous. No PII linkage happens inside this module.
9. Known limitations
Honest list of things this module does not do, so you can plan around them:
- Cross-venue wallet attribution is deferred. Identifying the same entity on Polymarket and Hyperliquid is a v1.0, opt-in feature. v0.1 does per-venue attribution only.
- On-chain rate limits matter. Public Polygon RPC throttles during busy markets. Use a paid endpoint (Alchemy, QuickNode, or self-hosted) in production. Configure via the standard
web3provider. - GAIL and AIRL are offline only. No market simulator exists, so on-policy variants do not apply. The offline discriminator-as-reward formulation is the scoped design, documented in policy.
- IRL is compute-heavy. Multi-minute fits on large trajectories. Kept opt-in behind
[flow-irl]; the default policy path is the shadow-policy tree plus GBDT. - Single-lock concurrency. Fine at v0.1 target rates. Per-market sharding is a v1.0 item if you need 100k events per second.
- Synthetic realism. Synthetic generators may miss patterns real bots use. Use recorded-day fixtures as the realism anchor for your integration tests.
10. Run the full validation locally
The full pre-deployment check:
# 1. Install with all extras.
pip install 'horizon[flow,flow-irl,flow-ml]'
# 2. Run the flow test suite.
pytest tests/flow/ -x -q
# 3. Run both end-to-end behavioral audits.
python examples/flow_behavioral_audit.py
python examples/flow_anon_behavioral_audit.py
# 4. Benchmark performance on your hardware.
python examples/flow_performance_benchmark.py
# 5. Confirm the core SDK test suite is unaffected.
pytest tests/ -x -q --ignore=tests/flow
All five must pass before cutting a release.
Next steps
- For the operational side (how to run this continuously against live feeds), see deployment.
- For the compliance-reviewer summary, see compliance-memo.
- For how each detector works and what it catches, see detectors.