Alpha: trade with flow as an input

Use actor profiles, toxicity measures, and reverse-engineered bot policies as alpha signals. Not just defensive filters.

The defensive recipes (Defend) treat flow findings as filters: when the signal is bad, don’t trade. The alpha recipes treat the same data as inputs: the signal tells you something the market hasn’t priced yet.

Five recipes, ordered by how speculative each is. The first two rely on well-established microstructure relationships; the last one is research-grade and requires care before production use.

Recipe 1: fade a wash-trade pump

The one with the highest expected value. When the wash-trade detector fires on a market with a sudden price move, the move is (partially) fake. Inflated by self-trading rather than real demand. It reverts.

python
import horizon as hz
from datetime import datetime, timedelta, timezone
from horizon.flow.events import AnomalyCategory


class WashFadeStrategy(hz.Strategy):
 """When wash trading is detected and price has moved, fade the move."""

 def evaluate(self, f, universe):
 signals = []
 since = datetime.now(timezone.utc) - timedelta(minutes=5)

 for market in universe:
 recent_wash = hz.flow.anomalies(
 market_id=market.id,
 category=AnomalyCategory.WashTrade,
 since=since,
 )
 if not recent_wash:
 continue
 # Highest-confidence finding drives conviction
 top = max(recent_wash, key=lambda w: w.confidence)
 if top.confidence < 0.75:
 continue

 # Direction: which way did the price move?
 recent_return_5m = f.mid_return_5m[market.id] # your feature
 if recent_return_5m > 0.02: # up move → fade short
 signals.append(hz.Signal.decrease(
 market, edge_bps=int(50 * top.confidence), horizon="1h",
 ))
 elif recent_return_5m < -0.02:
 signals.append(hz.Signal.increase(
 market, edge_bps=int(50 * top.confidence), horizon="1h",
 ))
 return signals

Why it works. Wash trading inflates reported volume and nudges price via trade-driven order flow impact (Cont-Kukanov-Stoikov 2014), but without underlying directional demand, the move decays as real counterparties fade it. Cong-Li-Tang-Yang (2023) estimated that wash-inflated price moves in crypto revert within hours.

Risks.

  • False positive on the wash detector → you fade a genuine move.
  • Wash continues indefinitely → your fade is steamrolled.

Mitigate with recipe 5 from Defend: hard stop-loss at the wash-finding confidence level.

Recipe 2: reduce adverse selection by reading counterparty taxonomy

Adverse selection is the cost of trading against someone who knows more than you. HFTs and informed traders are the primary sources. The Kirilenko taxonomy gives you a probability of each, which you can translate into an expected adverse-selection haircut on your quoted price.

python
import horizon as hz
from horizon.flow.actors.taxonomy import TraderCategory


class AdverseSelectionAwareMaker(hz.Strategy):
 """Market-making strategy that widens spreads when the top-of-book
 counterparty is likely HFT or informed."""

 BASE_SPREAD_BPS = 10

 def evaluate(self, f, universe):
 signals = []
 for market in universe:
 # Get top 2 wallets on each side of book
 bid_wallets = self._top_wallets(market.id, "bid", k=2)
 ask_wallets = self._top_wallets(market.id, "ask", k=2)

 adverse_hft_prob = self._max_hft_prob(bid_wallets + ask_wallets)

 # Widen by up to 2x when adverse HFT probability is high
 widening = 1.0 + adverse_hft_prob
 effective_spread = self.BASE_SPREAD_BPS * widening

 # Quote at that spread
 # (details of how you emit maker quotes depend on your pipeline)
 signals.extend(self._maker_quotes(market, effective_spread))
 return signals

 def _max_hft_prob(self, wallets: list[str]) -> float:
 best = 0.0
 for w in wallets:
 profile = hz.flow.actor_profile(w, venue="polymarket")
 if profile is None:
 continue
 hft = profile.taxonomy_probs.get(TraderCategory.HFT.value, 0.0)
 best = max(best, hft)
 return best

Why it works. HFT market makers earn the spread by being fast enough to adverse-select slower counterparties. If you quote at the same spread they do, they pick you off on the half of the order flow where they have the better signal. The Kirilenko-weighted widening is a first-order compensation.

Calibration. Backtest widening factors in [1.0, 1.5, 2.0] against your historical P&L. A widening = 1 + hft_prob heuristic over-widens when HFT probability is inflated by small samples; cap adverse_hft_prob with min(adverse_hft_prob, 0.6) until you have enough events per actor for the Kirilenko classifier’s event_count >= 100 threshold.

Recipe 3: Hawkes branching as a regime indicator

Hawkes branching ratio distinguishes Poisson-ish from self-exciting flow regimes. Transitions between regimes predict volatility expansions. Useful as a vol-scaling signal in a broader strategy.

python
import horizon as hz
from horizon.flow.config import HawkesConfig
from horizon.flow.toxicity import HawkesFingerprint


class HawkesRegimeStrategy(hz.Strategy):
 """Increase size when regime is stable (low branching); decrease when
 regime tips toward self-excitation (high branching). Flat volatility
 produces the highest risk-adjusted return for slow mean-reversion."""

 def __init__(self):
 super().__init__()
 self._hawkes: dict[str, HawkesFingerprint] = dict()

 def on_trade(self, market_id: str, timestamp):
 h = self._hawkes.setdefault(
 market_id,
 HawkesFingerprint(HawkesConfig(window_s=300.0, kernel_decay_s=1.0)),
 )
 h.observe(timestamp)

 def evaluate(self, f, universe):
 signals = []
 for market in universe:
 h = self._hawkes.get(market.id)
 if h is None:
 continue
 est = h.estimate()
 branching = est.branching_ratio if est else 0.3

 # Size inversely proportional to branching
 size_mult = 1.5 if branching < 0.2 else 1.0 if branching < 0.5 else 0.5

 if f.z[market.id] < -2:
 signals.append(hz.Signal.increase(
 market,
 edge_bps=int(30 * size_mult),
 horizon="1d",
 ))
 return signals

Why it works. Filimonov-Sornette (2012) identified near-critical Hawkes regimes (branching → 1) as precursors to flash-crash-like moves. In milder form, elevated branching correlates with realized-vol expansion in the next hour. Trading a slow mean-reversion strategy into such a regime has worse P&L than trading it into a stable one.

This is a regime filter, not a directional signal. It doesn’t tell you which way to trade. It tells you how much to trade.

Recipe 4: consume a reverse-engineered shadow policy

Once you’ve fit a shadow policy for a recurring counterparty, the resulting decision-tree or GBDT can be queried at runtime to predict that counterparty’s next action. This is alpha if the counterparty acts predictably on observable features.

python
import pickle

import horizon as hz
from horizon.flow.policy.features import PolicyFeatureExtractor, FEATURE_NAMES


class ShadowAwareStrategy(hz.Strategy):
 """Anticipate a known counterparty by querying their shadow policy."""

 TARGET_ACTORS = ["0xKnownBot1", "0xKnownBot2"]

 def __init__(self):
 super().__init__()
 self._models = dict() # actor_id -> sklearn GBDT
 self._labels = dict()
 self._feat = PolicyFeatureExtractor()
 for actor in self.TARGET_ACTORS:
 policy = hz.flow.shadow_policy(actor)
 if policy and policy.method.value == "shadow_gbdt":
 blob = pickle.loads(policy.model_blob)
 self._models[actor] = blob["gbdt"]
 self._labels[actor] = blob["labels"]

 def observe_market_event(self, ev):
 # Called by your ingestion wiring; forwards into the feature extractor
 self._feat.observe(ev)

 def evaluate(self, f, universe):
 signals = []
 for market in universe:
 if not self._models:
 continue
 # For each target actor, predict their next action
 for actor in self.TARGET_ACTORS:
 gbdt = self._models.get(actor)
 if gbdt is None:
 continue
 state = self._feat.featurize(
 actor_id=actor,
 market_id=market.id,
 now=f.now,
 )
 X = [[state.get(n, 0.0) for n in FEATURE_NAMES]]
 probs = gbdt.predict_proba(X)[0]
 labels = self._labels[actor]
 # If actor is predicted likely to buy in the next window,
 # get in front of them
 p_buy = probs[labels.index("buy")] if "buy" in labels else 0.0
 if p_buy > 0.7:
 signals.append(hz.Signal.increase(
 market, edge_bps=50, horizon="1h",
 ))
 return signals

Why it works. If the counterparty’s behavior is well-explained by observable state (OFI, spread, time-of-day), a fitted shadow policy extrapolates. You get in front of their order, capture some of the impact, then exit as they complete.

Risks. This is the most speculative recipe on the page.

  • Overfit policies will predict confidently even when the counterparty’s actual trigger isn’t in your feature set.
  • Once you’re trading against a reverse-engineered policy, the counterparty’s actions become partly reactive to yours. The policy stops describing them faithfully.
  • Regulatory context matters: trading directly on a reverse-engineered counterparty strategy could be interpreted as layering-adjacent by a zealous regulator. Be able to document independent rationale.

Mitigation.

  • Validate the shadow policy’s holdout_accuracy is above 0.8 before trading on it. Below that, the rules are noise.
  • Refit weekly. Behavioral drift is the norm, not the exception.
  • Keep the explicit rules (policy.summary["rules"]) rather than just the pickled model. If the rule is “buys when OFI_5s above 0.3 AND spread below 10 bps”, trade based on THAT condition rather than the black-box model prediction. Much more defensible and robust.

Recipe 5: flow-weighted cross-market spread

Elegant and the most general. Across a basket of related markets (the candidates of a single prediction event, the legs of a perp-spot arb), weight your exposure by the relative cleanliness of each market’s flow.

python
import horizon as hz
from horizon.flow.toxicity import HawkesFingerprint
from horizon.flow.config import HawkesConfig


class FlowWeightedBasket(hz.Strategy):
 """Allocate across related markets inversely to flow toxicity."""

 BASKET = ["0xMarketA", "0xMarketB", "0xMarketC"] # candidates in one event

 def evaluate(self, f, universe):
 # Pull current toxicity per market
 toxicity = dict()
 for mid in self.BASKET:
 h = self._hawkes_for(mid)
 est = h.estimate() if h else None
 toxicity[mid] = est.branching_ratio if est else 0.3

 # Weight: cleaner markets get bigger share
 # weight ∝ (1 - toxicity)
 weights = {mid: max(0.0, 1.0 - t) for mid, t in toxicity.items()}
 total = sum(weights.values()) or 1.0
 shares = {mid: w / total for mid, w in weights.items()}

 signals = []
 for market in universe:
 if market.id not in self.BASKET:
 continue
 base_edge = 30
 signals.append(hz.Signal.increase(
 market,
 edge_bps=int(base_edge * shares[market.id] * len(self.BASKET)),
 horizon="1h",
 ))
 return signals

 def _hawkes_for(self, mid): ... # your per-market Hawkes wiring

Why it works. The relative allocation tilts toward markets where your fills are cleaner (less likely to be adverse-selected), improving post-transaction-cost return even if raw alpha per market is identical.

The generic pattern

All five recipes share a structure:

  1. Read the flow store or run the toxicity estimator in background.
  2. Translate a flow observation into a size / direction / spread / allocation modifier.
  3. Apply that modifier to the signal you would have generated anyway.

Flow is a MODIFIER, not a REPLACEMENT for your base strategy. The module doesn’t generate alpha on its own. It reshapes how aggressively you express alpha you already have.

Backtesting flow-aware strategies

Two constraints:

  • The flow store has to be populated for the backtest period. Either run the flow engine over recorded feed data first, then run the strategy against a populated store. Or run the flow engine inline with the backtest so events flow through both.
  • The same data goes into detection AND strategy. Avoid look-ahead: the MarketEvent at time t can update the flow store, but the strategy at time t should only read findings with detected_at <= t. hz.flow.anomalies(until=t) gets you that slice.
python
# Sketch: run detection and strategy on the same event stream,
# in lockstep, without look-ahead.
for ev in recorded_feed:
 engine.ingest(ev) # findings emitted with ev.timestamp
 if ev.event_kind == MarketEventKind.TradeTape:
 # Strategy only sees findings up to ev.timestamp
 recent = hz.flow.anomalies(
 market_id=ev.market_id,
 until=ev.timestamp,
 )
 strategy.on_flow(recent)

This is a strict requirement for a compliance-grade backtest. Documentation of the lockstep replay in your validation report is what makes the backtest’s P&L defensible.

Related