Actor profiling
Per-actor feature extraction, Kirilenko 6-category soft-labeler, and Polymarket wallet-clustering heuristics.
The actors layer turns a MarketEvent stream into per-actor records the rest of the pipeline can reason about. Three components:
ActorFeatureExtractor. Incremental rolling features per actor.KirilenkoClassifier. 6-category soft-label over the feature vector.WalletHeuristicLinker. On-chain pairwise links between addresses (Polymarket / Polygon).
ActorFeatureExtractor
Ingests MarketEvents one at a time, maintains bounded rolling state per actor, produces an ActorFeatures vector on demand.
from horizon.flow.actors.profile import ActorFeatureExtractor
from horizon.flow.config import FlowConfig
ext = ActorFeatureExtractor(FlowConfig())
for ev in stream: # each is a MarketEvent
refresh_due = ext.ingest(ev)
if refresh_due:
profile = ext.snapshot(
actor_id=ev.actor_id,
venue_name="polymarket",
last_updated_seq=len(stream_so_far),
)
store.upsert_profile(profile, at=ev.timestamp)
ingest returns True when a profile refresh is due (every profile_refresh_event_interval events per actor, or on first-seen). The engine consumes this signal to persist without write-amplifying every event.
Features
The feature vector matches the Kirilenko-Kyle-Samadi-Tuzun (2017) taxonomy inputs plus on-chain extras:
| Feature | What |
|---|---|
order_to_trade_ratio | placements / fills |
cancel_before_fill_rate | cancels / (cancels + fills) |
median_time_to_cancel_ms | over resolved cancels |
maker_ratio | maker fills / total fills |
mean_order_size, median_order_size, std_order_size | size distribution moments |
size_bins | log-scale histogram (8 bins default) |
inter_arrival_median_s | per-actor event cadence |
hawkes_branching_ratio | cheap self-excitation proxy from CV of inter-arrivals. NOT an MLE Hawkes fit (see toxicity) |
market_entropy_bits | Shannon entropy over markets traded |
session_hour_hist | 24-bucket UTC hour histogram (normalized) |
gas_price_mode_wei, gas_price_std_wei | Polymarket/Polygon gas fingerprint |
nonce_cadence_cv | CV of nonce increments (regular vs bursty) |
Below FlowConfig.actors.profile_min_events (default 30) the snapshot returns a partial profile with only event_count and first_seen / last_seen. Small-sample artifacts don’t belong in a regulated record.
KirilenkoClassifier
Implements the Kirilenko-Kyle-Samadi-Tuzun (2017) taxonomy: HFT, opportunistic, fundamental buyer, fundamental seller, small, intermediary. Rule-based Gaussian-kernel scoring over per-feature templates calibrated for prediction-market cadence; produces a probability distribution over the six categories.
from horizon.flow.actors.taxonomy import KirilenkoClassifier, TraderCategory
clf = KirilenkoClassifier()
probs = clf.classify(profile.features)
# {'hft': 0.67, 'opportunistic': 0.21, ...}: sums to 1.0
top = clf.argmax(profile.features)
# TraderCategory.HFT
Why rule-based rather than a trained model
A trained model would need ground-truth labels we do not have for prediction markets (Kirilenko’s original labels came from manual CME participant tagging). Rule-based gives each category probability a traceable origin. Every number maps back to a per-feature distance from a published template. Compliance can read the derivation.
A v0.3 follow-up may train a model on labeled Polymarket whales as ground truth; until then, the rules are the published features applied to prediction-market cadence.
Fundamental direction split
FundamentalBuyer and FundamentalSeller share a feature profile (low OTR, slow, large sizes). Pass direction_signal ∈ [-1, 1] to resolve them:
# Negative = net selling, positive = net buying
probs = clf.classify(features, direction_signal=+0.8)
# FundamentalBuyer gets the majority of fundamental mass
Without a direction signal, the fundamental mass splits 50/50.
Template calibration
Templates are tuned for prediction-market cadence (inter-arrival in seconds, not microseconds). Equity-tape overrides are reserved for v0.2. When venue cadence changes materially, subclass KirilenkoClassifier and override _TEMPLATES rather than mutating the shipped dict in place.
WalletHeuristicLinker
Produces pairwise same-entity links between on-chain addresses using four heuristics from the wallet-clustering literature. Does NOT fetch from chain. Ingestion (PolymarketFlowSource with an enriched Polygon RPC) feeds hints in.
from horizon.flow.actors.wallet_heuristics import WalletHeuristicLinker
from horizon.flow.config import FlowConfig
from datetime import datetime, timezone
linker = WalletHeuristicLinker(FlowConfig())
now = datetime.now(timezone.utc)
# Meiklejohn multi-input heuristic.
linker.observe_common_input(addresses=["0xA", "0xB"], tx_hash="0xTX", at=now)
# Victor gas-payer proxy.
linker.observe_gas_payer(payer="0xA", sender="0xC", at=now, tx_hash="0xTX2")
# Approval-contract reuse (weaker).
linker.observe_approval_reuse(addr_a="0xA", addr_b="0xD", contract="0xCT", at=now)
# Deposit-address reuse (via CEX deposit addr).
linker.observe_deposit_reuse(addr_a="0xA", addr_b="0xE", deposit_addr="0xDP", at=now)
# Retrieve asserted links above a confidence floor.
links = linker.pairs_over(min_confidence=0.7)
Confidence defaults, from FlowConfig.wallet_heuristics:
| Heuristic | Default confidence | Source |
|---|---|---|
common_input | 0.95 | Meiklejohn et al. 2013 |
gas_payer | 0.75 | Victor 2020 |
approval_reuse | 0.55 | weaker. Contracts get reused |
deposit_reuse | 0.80 | Victor 2020 |
Caveats
Heuristics are probabilistic. On Ethereum (Polygon), a multi-input pattern is often a contract batch, which weakens the signal compared with Bitcoin. Confidence values are calibrated conservatively; the module records the heuristic used per link so downstream callers can apply their own threshold per use case.
The linker does not assert personally-identifying information. All addresses remain pseudonymous. Linking a wallet to a legal entity requires KYC data or third-party enrichment that is out of scope for the SDK.
Citations
- Kirilenko, A., Kyle, A. S., Samadi, M., Tuzun, T. (2017). “The Flash Crash: High-Frequency Trading in an Electronic Market.” Journal of Finance, 72(3), 967–998.
- Meiklejohn, S., Pomarole, M., Jordan, G., Levchenko, K., McCoy, D., Voelker, G. M., Savage, S. (2013). “A Fistful of Bitcoins.” IMC 2013.
- Victor, F. (2020). “Address clustering heuristics for Ethereum.” Financial Cryptography.
- Harrigan, M., Fretter, C. “The Unreasonable Effectiveness of Address Clustering.”
- Welford, B. P. (1962). “Note on a method for calculating corrected sums of squares and products.”