Clustering
Three orthogonal lenses for coordinated-actor detection: behavioral (HDBSCAN), temporal (DTW / k-Shape), and network (Louvain).
Wallet clusters are actors believed to represent the same entity (or a coordinated group). Four methods contribute: three statistical / graph-based ones covered here, plus the on-chain wallet heuristics. Each produces a WalletCluster with a method-specific confidence. Cross-method aggregation happens in downstream logic, not inside the clusterer.
BehavioralClusterer
Reference. Campello, R. J., Moulavi, D., Sander, J. (2013). “Density-based clustering based on hierarchical density estimates.” PAKDD. Trader-clustering template: Tumminello, M., Lillo, F., Piilo, J., Mantegna, R. N. (2012). New Journal of Physics, 14(1).
Signal. Actors with similar fingerprints. Same order-to-trade ratio, same cancel behavior, same Hawkes branching, same hour-of-day activity. Are more likely to be the same party or the same algorithmic implementation.
Method. HDBSCAN on the standardized actor feature matrix. HDBSCAN is preferred over DBSCAN because:
- No $\epsilon$ parameter (hard to pick in mixed-unit feature space).
- Variable density (some clusters tight, others loose).
- Native “noise” label for actors that don’t cluster. Honest about non-clusterable outliers.
Features used. order_to_trade_ratio, cancel_before_fill_rate, median_time_to_cancel_ms, maker_ratio, mean_order_size, inter_arrival_median_s, hawkes_branching_ratio, market_entropy_bits. Actors with fewer than 5 populated features are dropped. HDBSCAN imputation on sparse rows produces arbitrary clusters.
from horizon.flow.clustering import BehavioralClusterer
clusterer = BehavioralClusterer(config)
clusters = clusterer.cluster(profiles, venue_name="polymarket")
# List[WalletCluster] with method=ClusterMethod.Behavioral
Dependency. hdbscan from the horizon[flow] extras. Without it the clusterer returns []. A missing cluster is better than a wrong cluster for compliance.
Config. ClusteringConfig.hdbscan_min_cluster_size (default 5), hdbscan_min_samples (3), min_actors_for_clustering (10).
TemporalClusterer
Reference. Paparrizos, J., Gravano, L. (2015). “k-Shape.” SIGMOD 2015. Sakoe, H., Chiba, S. (1978). Dynamic Time Warping.
Signal. Many bots share upstream triggers (same oracle, same news wire, same arbitrage opportunity). Their event-timing series are highly correlated even when per-event sizes differ.
Method. For each actor, build a binned rate series (events per $\Delta$ over a rolling window). Pairwise DTW distance between normalized rate series; pairs below dtw_max_distance are linked; connected components → clusters.
from horizon.flow.clustering import TemporalClusterer
clusterer = TemporalClusterer(config)
clusters = clusterer.cluster(
actor_timestamps={
"0xabc...": [ts1, ts2, ...],
"0xdef...": [ts1, ts2, ...],
},
venue_name="polymarket",
bin_s=5.0,
window_s=3600.0,
)
Dependency. dtaidistance from horizon[flow] (fast C-accelerated DTW). A pure-Python fallback is built in for environments without the extras. Slower but keeps tests green.
Config. ClusteringConfig.dtw_max_distance (default 1.0 after L2 normalization). Lower = tighter clusters.
NetworkCommunityClusterer
Reference. Blondel, V. D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E. (2008). “Fast unfolding of communities in large networks.” Journal of Statistical Mechanics. Trader-network template: Cohen-Cole, E., Kirilenko, A., Patacchini, E.
Signal. Actors who trade the same markets at the same time share upstream information or coordination. Building an actor graph (edge weight = co-trade frequency within time windows) and running community detection identifies groups.
Method. Louvain algorithm on the actor × actor graph; edges weighted by co-occurrence count within cooccurrence_window_s-long windows. Falls back to networkx.community.greedy_modularity_communities when python-louvain is not installed.
from horizon.flow.clustering import NetworkCommunityClusterer
clusterer = NetworkCommunityClusterer(config)
clusters = clusterer.cluster(
events=all_market_events,
venue_name="polymarket",
cooccurrence_window_s=60.0,
)
Dependencies. networkx always (lightweight); python-louvain optional for the tunable-resolution Louvain. Both live in horizon[flow].
Config. ClusteringConfig.louvain_resolution (default 1.0; standard modularity). Higher = more, smaller communities.
Composite clusters
V0.1 does not ship a cross-method composite clusterer. Each lens emits WalletCluster records with its method set, and downstream application code can reconcile them. V1.0 adds a CompositeClusterer that runs all four lenses and merges overlapping assignments via confidence-weighted union.
Why three lenses
No single method suffices:
- Behavioral sees “same algorithm implementation” but not “same operator running two strategies.” Two bots from different firms with similar structure end up in the same cluster; the same firm running a market-maker and a directional strategy ends up in different clusters.
- Temporal sees “triggered by the same external signal” but not “different triggers with identical feature profiles.” Two bots listening to the same oracle cluster; two bots with the same behavior but different reaction times do not.
- Network sees “co-presence in the same markets” but not “causally related.” Index-rebalancers all touch the same names; so do manipulators coordinating a pump.
- On-chain heuristics (
WalletHeuristicLinker) see “same controller’s wallets” but only on venues that expose addresses.
A claim that survives two independent lenses is worth acting on. Single-lens claims are leads, not conclusions.
Determinism
All three clusterers honor FlowConfig.seed. HDBSCAN is deterministic by construction; Louvain via python-louvain accepts a random_state (seeded); the DTW fallback is deterministic. The same input + the same seed produces the same clusters across runs. A compliance property as much as a testing one.
Citations
- Campello, R. J., Moulavi, D., Sander, J. (2013). “Density-based clustering based on hierarchical density estimates.” PAKDD.
- Paparrizos, J., Gravano, L. (2015). “k-Shape: Efficient and Accurate Clustering of Time Series.” SIGMOD 2015.
- Blondel, V. D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E. (2008). “Fast unfolding of communities in large networks.” Journal of Statistical Mechanics.
- Tumminello, M., Lillo, F., Piilo, J., Mantegna, R. N. (2012). “Identification of clusters of investors from their real trading activity in a financial market.” New Journal of Physics, 14(1).
- Sakoe, H., Chiba, S. (1978). “Dynamic programming algorithm optimization for spoken word recognition.” IEEE TASSP.