Data model
MarketEvent, AnomalyFinding, ActorProfile, WalletCluster, PolicyModel. The five frozen dataclasses the flow module reads and writes.
Everything the flow module consumes or produces passes through one of five immutable dataclasses. Frozen-by-design: a finding cannot be silently edited after it fires; an actor profile carries the sequence number that produced it; a cluster records the method and confidence that formed it. All fields serialize cleanly to JSON and round-trip through the flow store.
MarketEvent
The unit of input. Observers (FlowObserver, FlowFeedHandler) and ingestion sources normalize their native shapes into this type before anything downstream sees them.
from horizon.flow.events import MarketEvent, MarketEventKind
from datetime import datetime, timezone
me = MarketEvent(
event_kind=MarketEventKind.OrderFilled,
market_id="0xTRUMP_2024", # venue-native id
venue_name="polymarket",
timestamp=datetime.now(timezone.utc),
actor_id="0xabc...", # wallet when venue exposes it
order_id="brokerord_123",
client_order_id="hzn_abc",
tx_hash="0x7a6b...", # on-chain enrichment
side="buy",
price=0.52,
quantity=500.0,
filled_quantity=500.0,
gas_price=30_000_000_000, # Polygon
nonce=42,
block_number=50_123_456,
source="polymarket_ws", # provenance
raw={...}, # unparsed original message
)
MarketEventKind values: OrderPlaced, OrderCanceled, OrderAmended, OrderFilled, QuoteUpdate, TradeTape, BookSnapshot.
The actor_id is set only when the venue exposes counterparty identity. On Polymarket (wallets), Hyperliquid (wallets on L3), actor_id is populated. On Kalshi, equities, options, it is None; actor-level detectors skip cleanly; market-level detectors (VPIN, OFI, iceberg) still fire.
AnomalyFinding
The unit of output. Every detector returns findings.
from horizon.flow.events import AnomalyCategory, AnomalyFinding, AnomalySeverity
finding = AnomalyFinding.make(
detector_name="spoofing",
category=AnomalyCategory.Spoofing,
severity=AnomalySeverity.High,
market_id="0xTRUMP_2024",
venue_name="polymarket",
actor_id="0xSpoofer",
confidence=0.87,
score=5000.0, # detector-native
message="bait 5000 canceled 400ms after opposite-side aggressor",
evidence={
"bait_order_id": "ord_1",
"bait_size": 5000,
"cancel_ms": 400,
"aggressor_order_id": "ord_2",
"aggressor_size": 100,
},
citation="Lee, Eom, Park 2013. Microstructure-based Manipulation",
related_event_ids=("evt_xxx", "evt_yyy"),
)
AnomalyCategory values (v0.1):
Spoofing,Layering,QuoteStuffing,WashTrade,MomentumIgnition,Iceberg. Manipulation patterns.SplitOrder,ExecutionAlgoFingerprint. Not manipulation; observed execution style (useful for counterparty mapping).HftCluster. Composite high-confidence classification.
AnomalySeverity values: Low, Medium, High, Critical. Drives alert thresholds and risk-integration filtering.
The citation field is required by convention: every finding must trace back to the regulatory or academic source that defined the heuristic. Compliance reviewers read this.
ActorProfile
The per-actor fingerprint. Written by ActorFeatureExtractor, persisted on a refresh interval (default every 50 events per actor).
from horizon.flow.events import ActorFeatures, ActorProfile
profile = ActorProfile(
actor_id="0xabc...",
venue_name="polymarket",
features=ActorFeatures(
order_to_trade_ratio=18.3,
cancel_before_fill_rate=0.92,
median_time_to_cancel_ms=85.0,
maker_ratio=0.55,
mean_order_size=1.8,
inter_arrival_median_s=0.8,
hawkes_branching_ratio=0.68, # self-excitation proxy
market_entropy_bits=2.7, # entropy over markets traded
session_hour_hist=(0.02, 0.03, ...), # 24-bucket UTC histogram
gas_price_mode_wei=30_000_000_000, # Polymarket/Polygon fingerprint
nonce_cadence_cv=0.15,
event_count=1_420,
),
taxonomy_probs={
"hft": 0.67, "opportunistic": 0.21, "intermediary": 0.09,
"small": 0.02, "fundamental_buyer": 0.005, "fundamental_seller": 0.005,
},
cluster_id="bcl_abc123...", # assigned by clusterer, if any
last_updated_seq=4_512,
)
Features that can’t be computed (e.g., gas fingerprint on a non-on-chain venue) stay None rather than being imputed with zeros. The taxonomy classifier treats missing features as non-evidence and does not punish the actor for the venue’s data model.
WalletCluster
A believed-to-be-one-entity grouping of actors. Four ClusterMethod sources produce these; a Composite method reconciles conflicting assignments.
from horizon.flow.events import ClusterMethod, WalletCluster
from datetime import datetime, timezone
cluster = WalletCluster(
cluster_id="wc_a1b2c3...",
method=ClusterMethod.WalletHeuristic,
actor_ids=("0xabc...", "0xdef...", "0xghi..."),
confidence=0.82,
formed_at=datetime.now(timezone.utc),
venue_name="polymarket", # "" for cross-venue clusters
evidence={
"heuristic": "common_input",
"tx_hash": "0x...",
},
)
ClusterMethod values: Behavioral (HDBSCAN on features), Temporal (DTW / k-Shape on timings), Network (Louvain on co-trade graph), WalletHeuristic (Meiklejohn / Victor on-chain), Composite (confidence-weighted union).
Cluster confidence is NOT pairwise probability of same-entity. It is a per-method score. Cross-method aggregation happens in downstream logic, not inside the dataclass.
PolicyModel
A fitted reverse-engineering of an actor’s behavior. Produced by ShadowPolicyFitter (v0.1) or IRL fitters (v0.3+).
from horizon.flow.events import PolicyMethod, PolicyModel
from datetime import datetime, timezone
model = PolicyModel(
policy_id="pol_xyz...",
actor_id="0xabc...",
method=PolicyMethod.ShadowGBDT,
trained_at=datetime.now(timezone.utc),
trajectories_used=520,
feature_names=("ofi_5s", "spread_bps", "depth_imbalance", ...),
summary={
"method": "shap_tree_explainer",
"top_features": [
{"feature": "ofi_5s", "mean_abs_shap": 0.34},
{"feature": "spread_bps", "mean_abs_shap": 0.12},
],
"rules": [
{"description": "buy when ofi_5s GT 0.3 AND spread_bps LE 10",
"support": 210, "positive_fraction": 0.91},
],
"holdout_accuracy": 0.88,
},
model_blob=b"pickled sklearn",
top_rule="buy when ofi_5s GT 0.3 AND spread_bps LE 10",
holdout_accuracy=0.88,
)
The pickled model_blob is the trained sklearn (or torch for IRL) artifact. summary is the human-readable digest a compliance reviewer can read without unpickling.
PolicyMethod values: ShadowDecisionTree, ShadowGBDT, MaxEntIRL, GAIL, AIRL. v0.1 ships only the two Shadow variants; IRL values are reserved.
Why frozen
Compliance. A finding, once emitted, is an assertion about what the firm observed and when. Mutability would let a bad actor quietly soften a severity score after the fact. The dataclasses are @dataclass(frozen=True); the flow store’s anomalies table has triggers rejecting UPDATE and DELETE; findings also emit into the hash-chained audit log.
Replacement operations (for example, re-fitting a shadow policy) produce a new policy_id and a new row in policy_models; the old one remains queryable for audit replay.