Institutional readiness

What guarantees the flow module gives you and how to verify them yourself before deploying.

This page covers the properties a fund-grade deployment of horizon.flow depends on: correctness, determinism, compliance traceability, robustness, and performance. For each property, you get the guarantee the module makes and the exact command that verifies it on your machine.

If you are evaluating whether to deploy, run the commands at the end and read the output. The module is designed so every claim on this page is reproducible.

1. Correctness: every method is peer-reviewed

Every detector, classifier, and feature extractor cites a paper or regulatory source. The full citation map:

Subsystem	Method	Citation
Taxonomy	6-category soft-label	Kirilenko, Kyle, Samadi, Tuzun (2017), JoF
Toxicity, VPIN	volume-bucket with sensitivity	Easley, López de Prado, O’Hara (2012); Andersen, Bondarenko (2014) critique
Toxicity, OFI	signed order flow imbalance	Cont, Kukanov, Stoikov (2014), JFE
Toxicity, PIN	probability of informed trading	Easley, Kiefer, O’Hara, Paperman (1996)
Toxicity, Hawkes	multivariate self-excitation, branching ratio	Bacry, Jaimungal, Muzy (2015); Filimonov, Sornette (2012)
Spoofing	imbalance-triggered plus fast-cancel	Lee, Eom, Park (2013)
Layering	stacked orders, common actor, bulk cancel	FINRA Rule 5210, CFTC spoofing guidance
Quote stuffing	message-rate spike with low fill	Egginton, Van Ness, Van Ness (2016)
Wash trade	on-chain same-origin plus Benford deviation	Cong, Li, Tang, Yang (2023), Management Science
Iceberg	depth persistence despite visible fills	Hautsch, Huang (2012); Esser, Mönch (2007)
Wallet clustering	common-input-ownership plus deposit reuse	Meiklejohn et al. (2013); Victor (2020)
Behavioral cluster	density-based, noise-labeled	HDBSCAN (Campello et al. 2013); Tumminello et al. (2012)
Network cluster	modularity maximization	Louvain (Blondel et al. 2008)
Temporal cluster	shape-based k-means	k-Shape (Paparrizos, Gravano 2015)
Shadow policy	DT, GBDT, SHAP	Pomerleau (1988); Chen, Guestrin (2016); Lundberg, Lee (2017)
IRL	linear-reward MaxEnt	Ziebart, Maas, Bagnell, Dey (2008)
GAIL, AIRL	offline discriminator variants	Ho, Ermon (2016); Fu, Luo, Levine (2018)
ML anomaly	Isolation Forest plus LOB autoencoder	Liu, Ting, Zhou (2008); Dixon, Halperin, Bilokon (2020)

Per-detector citations and thresholds live in detectors, toxicity, and policy.

How you verify it

Run the paper-reproduction suite:

bash

pip install 'horizon[flow,flow-irl,flow-ml]'
pytest tests/flow/test_paper_reproduction.py -v

You should see tests reproducing:

VPIN producing a monotonic toxicity curve on an informed-trading regime.
Kirilenko taxonomy recovering the right labels on a 6-archetype synthetic market at 90% or better.
Spoofing heuristic catching injected imbalance-triggered cancels.
MaxEnt IRL recovering a known reward vector on a gridworld benchmark.

2. Correctness: end-to-end behavioral audits

Two runnable scripts exercise the full module end to end. Both print a PASS / FAIL table per layer and exit non-zero on any failure.

Wallet-exposing venues (Polymarket, Hyperliquid)

bash

python examples/flow_behavioral_audit.py

Covers 21 layers: ingestion, actor profiling, taxonomy classification, all six manipulation detectors, wallet clustering, shadow policy fitting, audit-chain integrity, and store WORM enforcement. Expected output ends with ALL LAYERS PASS.

Anonymous-tape venues (equities, options, most perps)

bash

python examples/flow_anon_behavioral_audit.py

Covers 13 layers exercising graceful degradation: when a venue does not expose per-order wallets, the module aggregates into anon_{market}_{window} pseudo-actors and still runs every market-level detector (VPIN, OFI, iceberg, quote-stuffing). Expected output ends with ALL LAYERS PASS.

Both scripts are reproducible and suitable for CI gates.

3. Determinism: same input, same output

Every stochastic component takes an explicit seed, propagated from FlowConfig:

Actor profiling samples features with cfg.seed.
Clustering (HDBSCAN, k-Shape, Louvain) uses random_state=cfg.seed.
IRL, GAIL, AIRL set torch.manual_seed and np.random.seed before training.
Shadow-policy sklearn estimators take random_state=cfg.policy.rng_seed.
Machine-learning anomaly detectors (Isolation Forest, autoencoder) use the same pattern.

How you verify it

bash

pytest tests/flow/test_institutional_validation.py -v -k determinism

The test generates 1000 events with a fixed seed, runs them through the engine twice, and asserts both runs produce bit-identical findings and actor profiles.

For your own regression tests, horizon flow replay reads a recorded event stream and re-runs it through a fresh engine. Byte-deterministic given the same config and seed. See CLI.

4. Compliance traceability

Every finding lives in two places at once: a hash-chained audit log and a queryable flow store.

Hash-chained audit log

Findings emit through five append-only AuditCategory members (FlowAnomaly, ActorProfiled, ClusterAssigned, PolicyInferred, BotDetected). This is the same hash chain used for order and execution records. AuditChain.verify() detects any post-hoc modification.

Usage:

python

from horizon.audit import AuditChain, SQLiteSink

sink = SQLiteSink("audit.db")
chain = AuditChain(sink)
result = chain.verify()
assert result.ok, f"chain broken at seq {result.first_gap_seq}"

Append-only flow store

SQLiteFlowStore installs SQLite triggers that reject any UPDATE or DELETE against the anomalies table. The store supports append and read, never modify. See store for the schema.

Every finding is traceable

Each finding carries:

A machine-readable category (spoofing, layering, wash_trade, and so on).
A human-readable message for reports.
The underlying MarketEvent IDs that triggered it, so a reviewer can trace a finding back to raw tape.
The config commit or checksum active when the finding was produced, so thresholds in effect at the time are recoverable.

How you verify it

bash

pytest tests/flow/test_institutional_validation.py -v -k "chain or worm or audit"

This runs: hash-chain integrity under a 1000-finding load, WORM trigger attack surface (direct UPDATE, DELETE by PK, mass DELETE, nested transactions are all rejected), and end-to-end audit-log completeness (every store record has a matching hash-chained audit event).

5. Robustness

Malformed input tolerance

The engine accepts NaN and Inf fields, missing attributes, out-of-range values, and random byte strings in actor IDs without crashing the host process. Bad events are either ignored or raise a typed exception the caller can catch.

Feed degradation

Backward-jumping timestamps, gaps longer than the rolling-window size, and duplicate events are all tolerated. The engine records the gap through the existing LiveFeed.on_gap hook but does not abort.

Concurrency

A single threading.Lock serializes the full ingest path. Parallel observers (audit subscriber, live feed handler, replay source) cannot interleave state updates.

How you verify it

bash

pytest tests/flow/test_institutional_validation.py -v -k "robust or feed"

6. Performance

Typical numbers on a single core, M-series laptop (run the benchmark on your own hardware for your numbers):

Metric	Value
Ingest throughput	~21,000 events/sec (about 48 μs per event)
Per-detector latency	0.5 to 1.0 μs each
Flow store write rate	~12,000 findings/sec
ML training (500 events)	~140 ms for Isolation Forest or autoencoder
Memory growth	under 10 MB over 30,000 events (bounded deques)

Headroom for v0.1 target rates (low thousands of events per second peak across Polymarket, Kalshi, Hyperliquid) is roughly 10x.

How you measure it on your hardware

bash

python examples/flow_performance_benchmark.py

The script prints a table of all the above metrics.

7. Interface stability

Pinned public surface:

horizon.flow.api: actor_profile, anomalies, cluster_of, shadow_policy.
horizon.flow.events: MarketEvent, AnomalyFinding, ActorProfile, WalletCluster, PolicyModel, plus the enums.
horizon.flow.config.FlowConfig: pinned; additions are new optional fields.
horizon.flow.store.SQLiteFlowStore: pinned constructor and methods.
horizon.flow.risk_integration.FlowAnomalyCheck: pinned constructor and check method.

Breaking changes to any of the above require a major version bump and a migration note in the roadmap. Additive changes ship in minor versions.

8. Retention

The module does not auto-delete anything. Two artifacts need a retention policy:

SQLiteFlowStore (the flow store). Append-only findings, cluster assignments, policy models. For 17a-4-equivalent WORM retention, back up the SQLite file and its WAL to immutable storage after each run.
AuditLog (the hash chain). Same retention as the rest of the SDK’s audit log; see professionals/retention.

On-chain data is never stored beyond what you ingest. Wallet addresses are pseudonymous. No PII linkage happens inside this module.

9. Known limitations

Honest list of things this module does not do, so you can plan around them:

Cross-venue wallet attribution is deferred. Identifying the same entity on Polymarket and Hyperliquid is a v1.0, opt-in feature. v0.1 does per-venue attribution only.
On-chain rate limits matter. Public Polygon RPC throttles during busy markets. Use a paid endpoint (Alchemy, QuickNode, or self-hosted) in production. Configure via the standard web3 provider.
GAIL and AIRL are offline only. No market simulator exists, so on-policy variants do not apply. The offline discriminator-as-reward formulation is the scoped design, documented in policy.
IRL is compute-heavy. Multi-minute fits on large trajectories. Kept opt-in behind [flow-irl]; the default policy path is the shadow-policy tree plus GBDT.
Single-lock concurrency. Fine at v0.1 target rates. Per-market sharding is a v1.0 item if you need 100k events per second.
Synthetic realism. Synthetic generators may miss patterns real bots use. Use recorded-day fixtures as the realism anchor for your integration tests.

10. Run the full validation locally

The full pre-deployment check:

bash

# 1. Install with all extras.
pip install 'horizon[flow,flow-irl,flow-ml]'

# 2. Run the flow test suite.
pytest tests/flow/ -x -q

# 3. Run both end-to-end behavioral audits.
python examples/flow_behavioral_audit.py
python examples/flow_anon_behavioral_audit.py

# 4. Benchmark performance on your hardware.
python examples/flow_performance_benchmark.py

# 5. Confirm the core SDK test suite is unaffected.
pytest tests/ -x -q --ignore=tests/flow

All five must pass before cutting a release.

Next steps

For the operational side (how to run this continuously against live feeds), see deployment.
For the compliance-reviewer summary, see compliance-memo.
For how each detector works and what it catches, see detectors.