Compliance overview
What the flow module records, the regulatory regimes it supports, and the material limitations you need to disclose.
This page is a one-page summary suitable for a compliance reviewer or registered principal. If you are a developer, read institutional readiness or detectors instead. This page skips the how and focuses on the what.
What the module does
horizon.flow observes other participants in a venue to detect bots, flag manipulation patterns, cluster coordinated actors, and reverse-engineer the policies of observable traders. It consumes:
- Your firm’s own audit log (ground truth on trades you originated).
- The venue’s live market data feed (tape, orderbook, placements, cancels).
- On-chain data from Polygon (for Polymarket) and Hyperliquid’s public API (for per-wallet fills).
It produces:
- Anomaly findings. Structured records of detected spoofing, layering, quote-stuffing, wash trading, momentum ignition, and iceberg patterns, each tied to peer-reviewed methodology. See detectors for citations.
- Actor profiles. Per-wallet rolling feature vectors with a soft-labeled taxonomy (HFT, opportunistic, fundamental, and so on).
- Cluster assignments. Coordinated-wallet groupings via HDBSCAN (behavioral), DTW (temporal), and Louvain (network).
- Policy models. Shadow decision-tree or gradient-boosted rules recovered from observed counterparty behavior, with SHAP feature attribution for human review.
What the module does NOT do
- It does not surveil your own traders. That is
horizon.compliance, unchanged and separate. - It does not identify real-world individuals behind pseudonymous wallets. Findings reference on-chain addresses only. No KYC linkage happens inside this module.
- It does not block orders by default. Your firm opts in by wiring
FlowAnomalyCheckinto a strategy’sRiskConfig.extra_checks. - It does not modify core Horizon SDK behavior. Five audit category values are appended (additive); the existing audit hash chain is unchanged.
Regulatory framing
The module supports, not replaces, your existing surveillance and record-keeping obligations. Relevant regimes:
| Regime | How the module supports it |
|---|---|
| SEC Rule 17a-4 (WORM record retention) | Flow store and audit log are both append-only. SQLite WORM triggers reject UPDATE and DELETE on the anomalies table. The deployment guide documents a 7-year Object Lock retention pattern. |
| FINRA Rule 5210 (improper transactions) | Layering, spoofing, and wash-trade detectors cite Rule 5210 and CFTC guidance in their docstrings. Findings are structured for inclusion in a surveillance report. |
| Reg BI (best interest, retail) | Shadow-policy output is human-readable rules. A registered principal can review whether a strategy is reacting to counterparty flow in ways consistent with the client’s best interest. |
| MAR Article 12 (market abuse, EU) | Pattern definitions align with ESMA-published indicators. |
| MiFID II transaction reporting | Not directly produced here; the existing horizon.audit covers transaction reporting. Flow findings are supplementary context. |
| CFTC spoofing guidance | The spoofing detector implements the Lee, Eom, Park (2013) imbalance-triggered plus fast-cancel heuristic, which is the academic restatement of the CFTC’s spoofing definition. |
What your firm commits to by deploying this
- Retention. The flow store and audit log hold findings and their evidence. They must be retained per your record-keeping policy (17a-4 or equivalent).
- Monitoring. Someone must watch for the hash chain verify failing, the event rate dropping, or a detector flooding. See deployment / monitoring.
- Review. A registered principal should periodically review findings above
AnomalySeverity.High. Routing via the existingAlertersurfaces them. - Tuning as a compliance event. Changing a detector threshold is a compliance-relevant change. Thresholds live in
config/flow.tomlunder version control; diffs are signed off before deployment. - Replay availability. The module supports byte-deterministic replay from recorded feeds. If an auditor asks “why did your system reject this order,” you must be able to reproduce the finding exactly.
What your firm is NOT committing to
- Identifying counterparties. Findings name wallets and venue IDs only.
- Acting on every finding. A finding is a recorded observation, not an obligation.
- Public disclosure. Findings are internal records unless you choose otherwise.
Where the evidence lives
| Artifact | Location | Guarantee |
|---|---|---|
| Anomaly findings | SQLiteFlowStore (default path data/flow.db) | Append-only. SQLite triggers reject UPDATE or DELETE on the anomalies table. See store. |
| Hash-chained audit events | AuditLog (default path data/audit.db) | Tamper-evident hash chain. AuditChain.verify() detects any modification. |
| Raw feed stream (for replay) | data/feeds/<date>.jsonl | Rotated daily, archived per retention policy. |
| Configuration at time of finding | config/flow.toml in your config repo, tagged by commit | Every detector threshold is version-controlled. Findings reference the config commit active at the time. |
| Peer-reviewed methodology | Citations in module docstrings, summarized on institutional readiness. | Every detector, classifier, and feature has a paper or regulatory source. |
Material limitations to disclose
No compliance summary is complete without the module’s actual limits:
- Coverage. v0.1 covers Polymarket, Kalshi, and Hyperliquid. Equities, options, and non-Hyperliquid perps have market-level signals only (no per-actor attribution) per the roadmap.
- VPIN. The Easley, López de Prado, O’Hara (2012) VPIN metric has been criticized by Andersen, Bondarenko (2014). The module emits VPIN as an indicator, not a classifier, and always cross-references with OFI and the Hawkes branching ratio. A finding never rests on VPIN alone.
- Wallet clustering. Meiklejohn (2013) and Victor (2020) heuristics are approximate. Every
WalletClusterrecord carries aconfidencefield and themethodused, so a reviewer can see which heuristic produced the grouping. - Shadow policy fidelity. Decision-tree or gradient-boosted models explain observable (state, action) correlations. They do not prove a counterparty’s internal logic. Output is phrased as “the bot appears to act when X,” not “the bot is programmed to do X.”
- IRL, GAIL, AIRL. Opt-in behind
[flow-irl]. Offline only: no market simulator exists, so on-policy variants are out of scope. Compute is expensive; the default path is shadow policy. - Machine-learning anomaly detection. Opt-in behind
[flow-ml]. Isolation Forest and LOB autoencoder are complements to the rule-based detectors, not replacements.
Verification you can run yourself
If you are evaluating whether to sign off, three commands reproduce the core claims on this page:
# 1. Paper reproductions (VPIN, Kirilenko taxonomy, spoofing, IRL).
pytest tests/flow/test_paper_reproduction.py -v
# 2. End-to-end audits covering all layers.
python examples/flow_behavioral_audit.py
python examples/flow_anon_behavioral_audit.py
# 3. The institutional-grade property suite
# (determinism, WORM, audit chain, schema stability).
pytest tests/flow/test_institutional_validation.py -v
All three must print green. The institutional readiness page has the full list of verifications and what each proves.
Further reading
- Institutional readiness. Self-audit checklist with commands to reproduce every claim.
- Deployment. Operational runbook.
- Detectors. What each detector catches and the paper that backs it.
- Store. Schema of the append-only flow store.
- Risk integration. How to gate orders on findings.