Data model

MarketEvent, AnomalyFinding, ActorProfile, WalletCluster, PolicyModel. The five frozen dataclasses the flow module reads and writes.

Everything the flow module consumes or produces passes through one of five immutable dataclasses. Frozen-by-design: a finding cannot be silently edited after it fires; an actor profile carries the sequence number that produced it; a cluster records the method and confidence that formed it. All fields serialize cleanly to JSON and round-trip through the flow store.

MarketEvent

The unit of input. Observers (FlowObserver, FlowFeedHandler) and ingestion sources normalize their native shapes into this type before anything downstream sees them.

python

from horizon.flow.events import MarketEvent, MarketEventKind
from datetime import datetime, timezone

me = MarketEvent(
 event_kind=MarketEventKind.OrderFilled,
 market_id="0xTRUMP_2024", # venue-native id
 venue_name="polymarket",
 timestamp=datetime.now(timezone.utc),
 actor_id="0xabc...", # wallet when venue exposes it
 order_id="brokerord_123",
 client_order_id="hzn_abc",
 tx_hash="0x7a6b...", # on-chain enrichment
 side="buy",
 price=0.52,
 quantity=500.0,
 filled_quantity=500.0,
 gas_price=30_000_000_000, # Polygon
 nonce=42,
 block_number=50_123_456,
 source="polymarket_ws", # provenance
 raw={...}, # unparsed original message
)

MarketEventKind values: OrderPlaced, OrderCanceled, OrderAmended, OrderFilled, QuoteUpdate, TradeTape, BookSnapshot.

The actor_id is set only when the venue exposes counterparty identity. On Polymarket (wallets), Hyperliquid (wallets on L3), actor_id is populated. On Kalshi, equities, options, it is None; actor-level detectors skip cleanly; market-level detectors (VPIN, OFI, iceberg) still fire.

AnomalyFinding

The unit of output. Every detector returns findings.

python

from horizon.flow.events import AnomalyCategory, AnomalyFinding, AnomalySeverity

finding = AnomalyFinding.make(
 detector_name="spoofing",
 category=AnomalyCategory.Spoofing,
 severity=AnomalySeverity.High,
 market_id="0xTRUMP_2024",
 venue_name="polymarket",
 actor_id="0xSpoofer",
 confidence=0.87,
 score=5000.0, # detector-native
 message="bait 5000 canceled 400ms after opposite-side aggressor",
 evidence={
 "bait_order_id": "ord_1",
 "bait_size": 5000,
 "cancel_ms": 400,
 "aggressor_order_id": "ord_2",
 "aggressor_size": 100,
 },
 citation="Lee, Eom, Park 2013. Microstructure-based Manipulation",
 related_event_ids=("evt_xxx", "evt_yyy"),
)

AnomalyCategory values (v0.1):

Spoofing, Layering, QuoteStuffing, WashTrade, MomentumIgnition, Iceberg. Manipulation patterns.
SplitOrder, ExecutionAlgoFingerprint. Not manipulation; observed execution style (useful for counterparty mapping).
HftCluster. Composite high-confidence classification.

AnomalySeverity values: Low, Medium, High, Critical. Drives alert thresholds and risk-integration filtering.

The citation field is required by convention: every finding must trace back to the regulatory or academic source that defined the heuristic. Compliance reviewers read this.

ActorProfile

The per-actor fingerprint. Written by ActorFeatureExtractor, persisted on a refresh interval (default every 50 events per actor).

python

from horizon.flow.events import ActorFeatures, ActorProfile

profile = ActorProfile(
 actor_id="0xabc...",
 venue_name="polymarket",
 features=ActorFeatures(
 order_to_trade_ratio=18.3,
 cancel_before_fill_rate=0.92,
 median_time_to_cancel_ms=85.0,
 maker_ratio=0.55,
 mean_order_size=1.8,
 inter_arrival_median_s=0.8,
 hawkes_branching_ratio=0.68, # self-excitation proxy
 market_entropy_bits=2.7, # entropy over markets traded
 session_hour_hist=(0.02, 0.03, ...), # 24-bucket UTC histogram
 gas_price_mode_wei=30_000_000_000, # Polymarket/Polygon fingerprint
 nonce_cadence_cv=0.15,
 event_count=1_420,
 ),
 taxonomy_probs={
 "hft": 0.67, "opportunistic": 0.21, "intermediary": 0.09,
 "small": 0.02, "fundamental_buyer": 0.005, "fundamental_seller": 0.005,
 },
 cluster_id="bcl_abc123...", # assigned by clusterer, if any
 last_updated_seq=4_512,
)

Features that can’t be computed (e.g., gas fingerprint on a non-on-chain venue) stay None rather than being imputed with zeros. The taxonomy classifier treats missing features as non-evidence and does not punish the actor for the venue’s data model.

WalletCluster

A believed-to-be-one-entity grouping of actors. Four ClusterMethod sources produce these; a Composite method reconciles conflicting assignments.

python

from horizon.flow.events import ClusterMethod, WalletCluster
from datetime import datetime, timezone

cluster = WalletCluster(
 cluster_id="wc_a1b2c3...",
 method=ClusterMethod.WalletHeuristic,
 actor_ids=("0xabc...", "0xdef...", "0xghi..."),
 confidence=0.82,
 formed_at=datetime.now(timezone.utc),
 venue_name="polymarket", # "" for cross-venue clusters
 evidence={
 "heuristic": "common_input",
 "tx_hash": "0x...",
 },
)

ClusterMethod values: Behavioral (HDBSCAN on features), Temporal (DTW / k-Shape on timings), Network (Louvain on co-trade graph), WalletHeuristic (Meiklejohn / Victor on-chain), Composite (confidence-weighted union).

Cluster confidence is NOT pairwise probability of same-entity. It is a per-method score. Cross-method aggregation happens in downstream logic, not inside the dataclass.

PolicyModel

A fitted reverse-engineering of an actor’s behavior. Produced by ShadowPolicyFitter (v0.1) or IRL fitters (v0.3+).

python

from horizon.flow.events import PolicyMethod, PolicyModel
from datetime import datetime, timezone

model = PolicyModel(
 policy_id="pol_xyz...",
 actor_id="0xabc...",
 method=PolicyMethod.ShadowGBDT,
 trained_at=datetime.now(timezone.utc),
 trajectories_used=520,
 feature_names=("ofi_5s", "spread_bps", "depth_imbalance", ...),
 summary={
 "method": "shap_tree_explainer",
 "top_features": [
 {"feature": "ofi_5s", "mean_abs_shap": 0.34},
 {"feature": "spread_bps", "mean_abs_shap": 0.12},
 ],
 "rules": [
 {"description": "buy when ofi_5s GT 0.3 AND spread_bps LE 10",
 "support": 210, "positive_fraction": 0.91},
 ],
 "holdout_accuracy": 0.88,
 },
 model_blob=b"pickled sklearn",
 top_rule="buy when ofi_5s GT 0.3 AND spread_bps LE 10",
 holdout_accuracy=0.88,
)

The pickled model_blob is the trained sklearn (or torch for IRL) artifact. summary is the human-readable digest a compliance reviewer can read without unpickling.

PolicyMethod values: ShadowDecisionTree, ShadowGBDT, MaxEntIRL, GAIL, AIRL. v0.1 ships only the two Shadow variants; IRL values are reserved.

Why frozen

Compliance. A finding, once emitted, is an assertion about what the firm observed and when. Mutability would let a bad actor quietly soften a severity score after the fact. The dataclasses are @dataclass(frozen=True); the flow store’s anomalies table has triggers rejecting UPDATE and DELETE; findings also emit into the hash-chained audit log.

Replacement operations (for example, re-fitting a shadow policy) produce a new policy_id and a new row in policy_models; the old one remains queryable for audit replay.