Actor profiling

Per-actor feature extraction, Kirilenko 6-category soft-labeler, and Polymarket wallet-clustering heuristics.

The actors layer turns a MarketEvent stream into per-actor records the rest of the pipeline can reason about. Three components:

ActorFeatureExtractor. Incremental rolling features per actor.
KirilenkoClassifier. 6-category soft-label over the feature vector.
WalletHeuristicLinker. On-chain pairwise links between addresses (Polymarket / Polygon).

ActorFeatureExtractor

Ingests MarketEvents one at a time, maintains bounded rolling state per actor, produces an ActorFeatures vector on demand.

python

from horizon.flow.actors.profile import ActorFeatureExtractor
from horizon.flow.config import FlowConfig

ext = ActorFeatureExtractor(FlowConfig())
for ev in stream: # each is a MarketEvent
 refresh_due = ext.ingest(ev)
 if refresh_due:
 profile = ext.snapshot(
 actor_id=ev.actor_id,
 venue_name="polymarket",
 last_updated_seq=len(stream_so_far),
 )
 store.upsert_profile(profile, at=ev.timestamp)

ingest returns True when a profile refresh is due (every profile_refresh_event_interval events per actor, or on first-seen). The engine consumes this signal to persist without write-amplifying every event.

Features

The feature vector matches the Kirilenko-Kyle-Samadi-Tuzun (2017) taxonomy inputs plus on-chain extras:

Feature	What
`order_to_trade_ratio`	placements / fills
`cancel_before_fill_rate`	cancels / (cancels + fills)
`median_time_to_cancel_ms`	over resolved cancels
`maker_ratio`	maker fills / total fills
`mean_order_size`, `median_order_size`, `std_order_size`	size distribution moments
`size_bins`	log-scale histogram (8 bins default)
`inter_arrival_median_s`	per-actor event cadence
`hawkes_branching_ratio`	cheap self-excitation proxy from CV of inter-arrivals. NOT an MLE Hawkes fit (see toxicity)
`market_entropy_bits`	Shannon entropy over markets traded
`session_hour_hist`	24-bucket UTC hour histogram (normalized)
`gas_price_mode_wei`, `gas_price_std_wei`	Polymarket/Polygon gas fingerprint
`nonce_cadence_cv`	CV of nonce increments (regular vs bursty)

Below FlowConfig.actors.profile_min_events (default 30) the snapshot returns a partial profile with only event_count and first_seen / last_seen. Small-sample artifacts don’t belong in a regulated record.

KirilenkoClassifier

Implements the Kirilenko-Kyle-Samadi-Tuzun (2017) taxonomy: HFT, opportunistic, fundamental buyer, fundamental seller, small, intermediary. Rule-based Gaussian-kernel scoring over per-feature templates calibrated for prediction-market cadence; produces a probability distribution over the six categories.

python

from horizon.flow.actors.taxonomy import KirilenkoClassifier, TraderCategory

clf = KirilenkoClassifier()
probs = clf.classify(profile.features)
# {'hft': 0.67, 'opportunistic': 0.21, ...}: sums to 1.0
top = clf.argmax(profile.features)
# TraderCategory.HFT

Why rule-based rather than a trained model

A trained model would need ground-truth labels we do not have for prediction markets (Kirilenko’s original labels came from manual CME participant tagging). Rule-based gives each category probability a traceable origin. Every number maps back to a per-feature distance from a published template. Compliance can read the derivation.

A v0.3 follow-up may train a model on labeled Polymarket whales as ground truth; until then, the rules are the published features applied to prediction-market cadence.

Fundamental direction split

FundamentalBuyer and FundamentalSeller share a feature profile (low OTR, slow, large sizes). Pass direction_signal ∈ [-1, 1] to resolve them:

python

# Negative = net selling, positive = net buying
probs = clf.classify(features, direction_signal=+0.8)
# FundamentalBuyer gets the majority of fundamental mass

Without a direction signal, the fundamental mass splits 50/50.

Template calibration

Templates are tuned for prediction-market cadence (inter-arrival in seconds, not microseconds). Equity-tape overrides are reserved for v0.2. When venue cadence changes materially, subclass KirilenkoClassifier and override _TEMPLATES rather than mutating the shipped dict in place.

WalletHeuristicLinker

Produces pairwise same-entity links between on-chain addresses using four heuristics from the wallet-clustering literature. Does NOT fetch from chain. Ingestion (PolymarketFlowSource with an enriched Polygon RPC) feeds hints in.

python

from horizon.flow.actors.wallet_heuristics import WalletHeuristicLinker
from horizon.flow.config import FlowConfig
from datetime import datetime, timezone

linker = WalletHeuristicLinker(FlowConfig())
now = datetime.now(timezone.utc)

# Meiklejohn multi-input heuristic.
linker.observe_common_input(addresses=["0xA", "0xB"], tx_hash="0xTX", at=now)

# Victor gas-payer proxy.
linker.observe_gas_payer(payer="0xA", sender="0xC", at=now, tx_hash="0xTX2")

# Approval-contract reuse (weaker).
linker.observe_approval_reuse(addr_a="0xA", addr_b="0xD", contract="0xCT", at=now)

# Deposit-address reuse (via CEX deposit addr).
linker.observe_deposit_reuse(addr_a="0xA", addr_b="0xE", deposit_addr="0xDP", at=now)

# Retrieve asserted links above a confidence floor.
links = linker.pairs_over(min_confidence=0.7)

Confidence defaults, from FlowConfig.wallet_heuristics:

Heuristic	Default confidence	Source
`common_input`	0.95	Meiklejohn et al. 2013
`gas_payer`	0.75	Victor 2020
`approval_reuse`	0.55	weaker. Contracts get reused
`deposit_reuse`	0.80	Victor 2020

Caveats

Heuristics are probabilistic. On Ethereum (Polygon), a multi-input pattern is often a contract batch, which weakens the signal compared with Bitcoin. Confidence values are calibrated conservatively; the module records the heuristic used per link so downstream callers can apply their own threshold per use case.

The linker does not assert personally-identifying information. All addresses remain pseudonymous. Linking a wallet to a legal entity requires KYC data or third-party enrichment that is out of scope for the SDK.

Citations

Kirilenko, A., Kyle, A. S., Samadi, M., Tuzun, T. (2017). “The Flash Crash: High-Frequency Trading in an Electronic Market.” Journal of Finance, 72(3), 967–998.
Meiklejohn, S., Pomarole, M., Jordan, G., Levchenko, K., McCoy, D., Voelker, G. M., Savage, S. (2013). “A Fistful of Bitcoins.” IMC 2013.
Victor, F. (2020). “Address clustering heuristics for Ethereum.” Financial Cryptography.
Harrigan, M., Fretter, C. “The Unreasonable Effectiveness of Address Clustering.”
Welford, B. P. (1962). “Note on a method for calculating corrected sums of squares and products.”