Actor profiling

Per-actor feature extraction, Kirilenko 6-category soft-labeler, and Polymarket wallet-clustering heuristics.

The actors layer turns a MarketEvent stream into per-actor records the rest of the pipeline can reason about. Three components:

  • ActorFeatureExtractor. Incremental rolling features per actor.
  • KirilenkoClassifier. 6-category soft-label over the feature vector.
  • WalletHeuristicLinker. On-chain pairwise links between addresses (Polymarket / Polygon).

ActorFeatureExtractor

Ingests MarketEvents one at a time, maintains bounded rolling state per actor, produces an ActorFeatures vector on demand.

python
from horizon.flow.actors.profile import ActorFeatureExtractor
from horizon.flow.config import FlowConfig

ext = ActorFeatureExtractor(FlowConfig())
for ev in stream: # each is a MarketEvent
 refresh_due = ext.ingest(ev)
 if refresh_due:
 profile = ext.snapshot(
 actor_id=ev.actor_id,
 venue_name="polymarket",
 last_updated_seq=len(stream_so_far),
 )
 store.upsert_profile(profile, at=ev.timestamp)

ingest returns True when a profile refresh is due (every profile_refresh_event_interval events per actor, or on first-seen). The engine consumes this signal to persist without write-amplifying every event.

Features

The feature vector matches the Kirilenko-Kyle-Samadi-Tuzun (2017) taxonomy inputs plus on-chain extras:

FeatureWhat
order_to_trade_ratioplacements / fills
cancel_before_fill_ratecancels / (cancels + fills)
median_time_to_cancel_msover resolved cancels
maker_ratiomaker fills / total fills
mean_order_size, median_order_size, std_order_sizesize distribution moments
size_binslog-scale histogram (8 bins default)
inter_arrival_median_sper-actor event cadence
hawkes_branching_ratiocheap self-excitation proxy from CV of inter-arrivals. NOT an MLE Hawkes fit (see toxicity)
market_entropy_bitsShannon entropy over markets traded
session_hour_hist24-bucket UTC hour histogram (normalized)
gas_price_mode_wei, gas_price_std_weiPolymarket/Polygon gas fingerprint
nonce_cadence_cvCV of nonce increments (regular vs bursty)

Below FlowConfig.actors.profile_min_events (default 30) the snapshot returns a partial profile with only event_count and first_seen / last_seen. Small-sample artifacts don’t belong in a regulated record.

KirilenkoClassifier

Implements the Kirilenko-Kyle-Samadi-Tuzun (2017) taxonomy: HFT, opportunistic, fundamental buyer, fundamental seller, small, intermediary. Rule-based Gaussian-kernel scoring over per-feature templates calibrated for prediction-market cadence; produces a probability distribution over the six categories.

python
from horizon.flow.actors.taxonomy import KirilenkoClassifier, TraderCategory

clf = KirilenkoClassifier()
probs = clf.classify(profile.features)
# {'hft': 0.67, 'opportunistic': 0.21, ...}: sums to 1.0
top = clf.argmax(profile.features)
# TraderCategory.HFT

Why rule-based rather than a trained model

A trained model would need ground-truth labels we do not have for prediction markets (Kirilenko’s original labels came from manual CME participant tagging). Rule-based gives each category probability a traceable origin. Every number maps back to a per-feature distance from a published template. Compliance can read the derivation.

A v0.3 follow-up may train a model on labeled Polymarket whales as ground truth; until then, the rules are the published features applied to prediction-market cadence.

Fundamental direction split

FundamentalBuyer and FundamentalSeller share a feature profile (low OTR, slow, large sizes). Pass direction_signal ∈ [-1, 1] to resolve them:

python
# Negative = net selling, positive = net buying
probs = clf.classify(features, direction_signal=+0.8)
# FundamentalBuyer gets the majority of fundamental mass

Without a direction signal, the fundamental mass splits 50/50.

Template calibration

Templates are tuned for prediction-market cadence (inter-arrival in seconds, not microseconds). Equity-tape overrides are reserved for v0.2. When venue cadence changes materially, subclass KirilenkoClassifier and override _TEMPLATES rather than mutating the shipped dict in place.

WalletHeuristicLinker

Produces pairwise same-entity links between on-chain addresses using four heuristics from the wallet-clustering literature. Does NOT fetch from chain. Ingestion (PolymarketFlowSource with an enriched Polygon RPC) feeds hints in.

python
from horizon.flow.actors.wallet_heuristics import WalletHeuristicLinker
from horizon.flow.config import FlowConfig
from datetime import datetime, timezone

linker = WalletHeuristicLinker(FlowConfig())
now = datetime.now(timezone.utc)

# Meiklejohn multi-input heuristic.
linker.observe_common_input(addresses=["0xA", "0xB"], tx_hash="0xTX", at=now)

# Victor gas-payer proxy.
linker.observe_gas_payer(payer="0xA", sender="0xC", at=now, tx_hash="0xTX2")

# Approval-contract reuse (weaker).
linker.observe_approval_reuse(addr_a="0xA", addr_b="0xD", contract="0xCT", at=now)

# Deposit-address reuse (via CEX deposit addr).
linker.observe_deposit_reuse(addr_a="0xA", addr_b="0xE", deposit_addr="0xDP", at=now)

# Retrieve asserted links above a confidence floor.
links = linker.pairs_over(min_confidence=0.7)

Confidence defaults, from FlowConfig.wallet_heuristics:

HeuristicDefault confidenceSource
common_input0.95Meiklejohn et al. 2013
gas_payer0.75Victor 2020
approval_reuse0.55weaker. Contracts get reused
deposit_reuse0.80Victor 2020

Caveats

Heuristics are probabilistic. On Ethereum (Polygon), a multi-input pattern is often a contract batch, which weakens the signal compared with Bitcoin. Confidence values are calibrated conservatively; the module records the heuristic used per link so downstream callers can apply their own threshold per use case.

The linker does not assert personally-identifying information. All addresses remain pseudonymous. Linking a wallet to a legal entity requires KYC data or third-party enrichment that is out of scope for the SDK.

Citations

  • Kirilenko, A., Kyle, A. S., Samadi, M., Tuzun, T. (2017). “The Flash Crash: High-Frequency Trading in an Electronic Market.” Journal of Finance, 72(3), 967–998.
  • Meiklejohn, S., Pomarole, M., Jordan, G., Levchenko, K., McCoy, D., Voelker, G. M., Savage, S. (2013). “A Fistful of Bitcoins.” IMC 2013.
  • Victor, F. (2020). “Address clustering heuristics for Ethereum.” Financial Cryptography.
  • Harrigan, M., Fretter, C. “The Unreasonable Effectiveness of Address Clustering.”
  • Welford, B. P. (1962). “Note on a method for calculating corrected sums of squares and products.”