Roadmap

What's shipped in v0.1, what's coming through v1.0, and what's deliberately out of scope.

horizon.flow ships in phases. v0.1 is what landed with the first commit; each subsequent version adds capability without breaking the previous release’s API.

v0.1: Foundations + Polymarket / Kalshi / Hyperliquid

Shipped. What you can do today:

  • Ingest public CLOB + on-chain events from Polymarket, Kalshi, Hyperliquid, and from any LiveFeed or AuditLog via observers.
  • Profile actors incrementally with rolling features (order-to-trade ratio, inter-arrival CV, Hawkes branching proxy, maker ratio, gas-price fingerprint on Polygon).
  • Classify each actor via a Kirilenko et al. (2017) 6-category soft-label: HFT, opportunistic, fundamental buyer/seller, small, intermediary.
  • Cluster related wallets via four orthogonal methods: Meiklejohn / Victor heuristics for Polygon, HDBSCAN on features, DTW on inter-event timing, Louvain on the co-trading graph.
  • Detect six manipulation patterns: spoofing (Lee-Eom-Park 2013), layering (FINRA 5210), quote-stuffing (Egginton 2016), wash trading (Cong-Li-Tang-Yang 2023), momentum ignition, iceberg reloads (Hautsch-Huang 2012).
  • Cross-validate with flow-toxicity measures: VPIN, OFI, PIN, Hawkes branching ratio.
  • Reverse-engineer a per-actor shadow policy. Decision-tree + gradient-boosted classifier + SHAP feature attribution. Yielding human-readable rules.
  • Persist every finding in SQLiteFlowStore (WORM trigger on the anomalies table) AND in the hash-chained AuditLog.
  • Query via hz.flow.actor_profile(), hz.flow.anomalies(), hz.flow.cluster_of(), hz.flow.shadow_policy() or the CLI.
  • Gate trades via FlowAnomalyCheck in RiskConfig.extra_checks.

Test coverage: 92 flow-specific tests (paper-reproduction for VPIN and Kirilenko taxonomy; threshold-sensitivity per detector; shadow-policy end-to-end; clustering on bimodal and co-trading populations).

v0.2: Equities / options / perps with graceful degradation

Partially shipped. Tape-level ingestion for venues that don’t expose counterparty identities.

Shipped in v0.2.0:

  • ActorFeatureExtractor tolerates actor_id=None by aggregating into anon_{market}_{window} pseudo-actors (default 5-minute windows). Kirilenko taxonomy applies to the bucket’s aggregate behavior. Read as “this market/window is HFT-dominated” rather than “this wallet is HFT.”
  • AlpacaFlowSource. Wraps the existing AlpacaLiveFeed and emits normalized MarketEvents with actor_id=None. Template for IBKR / CCXT sources to follow.
  • Detectors handle anonymous tape cleanly: actor-scoped ones (spoofing, layering, momentum-ignition, split-order) skip; market-level ones (iceberg, wash-trade, quote-stuffing, toxicity) still fire.
  • New recipe page: Equities & options tape.
  • Tests: 13 new tests covering anonymization, graceful skip, AlpacaFlowSource wiring.

Coming in v0.2.1:

  • IBKRFlowSource following the same template.
  • CCXTFlowSource for crypto exchanges via CCXT.
  • Fully anonymized-path behavioral audit alongside the existing wallet-level one.

What stays unchanged from v0.1: all public APIs, the audit category set, the flow-store schema. A v0.1 deployment picks up v0.2 semantics automatically; set FlowConfig.actors.anonymize_window_s = 0.0 to preserve the v0.1 skip behavior.

v0.3: Inverse RL

Partially shipped.

Shipped in v0.3.0:

  • MaxEntIRLFitter fully implemented. Ziebart et al. (2008) with discretized state space, empirical transitions + Laplace smoothing, soft value iteration, and gradient descent on a per-(state, action) reward basis.
  • Pure numpy. No torch dependency. Runs on the base horizon install; no extras required.
  • 8 tests including a Ziebart-style 5×5 gridworld paper-reproduction: 400 expert trajectories toward the goal corner → recovered reward’s argmax sits at the goal. Deterministic.
  • PolicyModel output: per-action reward weights, top-rewarding (state, action) readouts with human-readable bin centers, log-likelihood, convergence diagnostics, round-trips cleanly through the flow store.
  • Doc update at Policy reverse-engineering.

Shipped in v0.3.1 ([flow-irl] extras, torch):

  • GAILFitter. Ho & Ermon (2016), offline variant. Fits a torch discriminator D(s, a) to distinguish expert demonstrations from random-policy samples, reports the learned reward r = log D - log(1 - D) with per-action preferences and feature-gradient importance. Offline because markets have no rewindable simulator. We drop the on-policy TRPO/PPO loop and keep the discriminator-as-reward formulation.
  • AIRLFitter. Fu, Luo, Levine (2018), offline variant. Extends GAIL with the state-only reward decomposition D(s, a) = exp(f(s)) / (exp(f(s)) + 1/|A|). Output f(s) identifies market states the actor values regardless of action. Useful for reward transfer.
  • 8 tests (refusal, shape, directional recovery, roundtrip, determinism, AIRL top-k ordering).
  • Both require torch via the [flow-irl] extras; graceful ModuleNotFoundError without it.

Users who want interpretable output today (most compliance cases) stay on the default shadow policy path; MaxEnt / GAIL / AIRL offer progressively richer views of the same demonstration set.

v0.4: ML-augmented anomaly detection

Partially shipped.

Shipped in v0.4.0:

  • IsolationForestDetector. Liu, Ting, Zhou (2008). sklearn-based; no torch dependency, no new extras required.
  • Scores a 5-dimensional market-state vector (spread, depth imbalance, multi-horizon mid returns, realized vol) against a forest fit on a burn-in window of that market’s normal activity. Flags statistical outliers as AnomalyCategory.MarketAnomaly.
  • Complements the rule-based detectors. Rules catch known patterns, Isolation Forest catches the long tail. Both layers run simultaneously in a production engine.
  • Cooldown + periodic refit so long-running deployments adapt to regime drift without emitting hundreds of duplicate findings.
  • Optional prefit() path skips burn-in when historical training data is available.
  • 7 tests including flash-event detection, cooldown suppression, and full FlowEngine integration.
  • New doc page: ML anomaly detection.

Shipped in v0.4.1 ([flow-ml] extras, torch):

  • AutoencoderDetector. Dixon, Halperin, Bilokon (2020). Symmetric MLP encoder-decoder on a 9-dim market-state vector; reconstruction MSE as the anomaly score. Catches nonlinear structural anomalies IsolationForest’s univariate-leaning partitions miss.
  • Same lifecycle as IsolationForest: burn-in → fit → score → cooldown → periodic refit. Optional prefit() skips burn-in when historical training data is available.
  • 6 tests including end-to-end FlowEngine integration.
  • Run both ML detectors in parallel; each emits findings under its own detector_name so downstream filters can treat them differently.

Coming later:

  • Transformer-based order-flow models when the research tier is worth the compute. Research-grade; expect longer iteration.

Precision / recall is compared against the classical detectors on recorded data so the user has an honest trade-off report before flipping the switch.

v1.0: Cross-venue + hardening

Planned. Shipping criteria:

  • Cross-venue wallet attribution: link a Polymarket wallet to a Hyperliquid address via same-address heuristics and optional third-party enrichment (Nansen, Arkham). Opt-in with a terms-of-service review step.
  • Regulatory-report templates: Reg BI, MAR, MiFID II surveillance report renderers that consume the flow store.
  • Performance hardening: less than 100 μs observer overhead at 1 kHz event rate; flow store supports 10 M events per day without degradation.
  • The full test target of ≥ 200 flow tests.

Out of scope

These requests come up; stating them clearly so expectations are calibrated:

  • Attributing wallets to real people. Findings stay pseudonymous. Linking a wallet to a legal entity requires subpoenaed exchange KYC data or compliance-operated KYT tooling, neither of which belongs in an open SDK.
  • Front-running or co-trading the detected bots. Ethically and legally distinct from surveillance. The module does not ship “copy the bot” helpers.
  • Real-time HFT defense at microsecond latency. Detectors are designed for second-to-minute horizons. Use dedicated FPGA / colo infrastructure for microsecond-grade protection.
  • Replacing horizon.compliance. Own-firm policing stays where it is. Flow is market intelligence, not compliance.

Version compatibility

Additions are strictly additive. A v0.2 flow store reads v0.1 records; v0.1 code continues to work on a v0.2 install. Deprecations, if any ever happen, are announced at least one minor version in advance with a working deprecation shim.