Equities & options tape (v0.2)

Running flow against anonymous tape. Alpaca, IBKR, CCXT. What works, what doesn't, and how to interpret market-level pseudo-profiles.

v0.2 adds tape-level ingestion for venues that don’t expose counterparty identities. The SDK treats those venues as “anonymous flow” and degrades gracefully: everything that works at the market level still works; actor-scoped detectors and per-actor taxonomy classification run on synthetic per-(market, time-window) pseudo-actors.

This is the recipe for running flow against Alpaca, IBKR, or CCXT. Anywhere the consolidated tape is what you see, not a wallet.

What you get and don’t get on anonymous tape

Side-by-side with the v0.1 Polymarket / Hyperliquid path:

Capability	Polymarket / HL (v0.1)	Alpaca / IBKR / CCXT (v0.2)
Per-wallet profiling	Yes	No (bucketed per-market-per-window)
Kirilenko taxonomy on a specific actor	Yes	No (applies to the bucket’s aggregate behavior)
Wallet clustering	Yes	N/A
Spoofing / layering / momentum-ignition / split-order	Yes (actor-scoped)	Skipped (no actor to attribute to)
Quote-stuffing (market-level)	Yes	Yes
Wash-trade (round-number + Benford)	Yes	Yes
Iceberg reload (book-level)	Yes	Yes
VPIN / OFI / Hawkes	Yes	Yes
Shadow-policy per actor	Yes	No (nothing to fit against)
`FlowAnomalyCheck` market-gate	Yes	Yes (fires on market-level findings)

In short: market-level signals are first-class; actor-level ones degrade to skips. The defense + alpha recipes in Defend / Alpha that rely only on market-level findings (recipes 1, 3, 4 on both pages) work unchanged.

The synthetic pseudo-actor model

When a MarketEvent arrives with actor_id=None, the ActorFeatureExtractor derives a synthetic key:

anon_{market_id}_{window_index}

where window_index = floor(event_timestamp / FlowConfig.actors.anonymize_window_s) (default 300 seconds = 5-minute windows).

Every anonymous event on the same market in the same 5-minute window accumulates into the same pseudo-actor. After enough events, that pseudo-actor gets:

A full ActorFeatures vector summarizing market activity in the window.
A Kirilenko taxonomy probability vector: “this market/window was 60% HFT-dominated, 25% opportunistic, 10% fundamental, 5% other.”
An entry in the flow store like any other profile.

Read these as characterizations of the market itself during that window, not of a specific counterparty. The language is different but the downstream machinery (store, audit, CLI) is identical.

Turn it off by setting anonymize_window_s = 0.0 in FlowConfig.actors. None-actor events are then silently skipped (v0.1 behavior).

Recipe: live Alpaca surveillance

python

from horizon.audit import AuditLog, SQLiteSink
from horizon.data.live.alpaca_ws import AlpacaLiveFeed
from horizon.data.live.base import SubscriptionKind
from horizon.flow import SQLiteFlowStore, make_default_engine
from horizon.flow.ingestion.alpaca import AlpacaFlowSource

# 1. Standard stores
flow_store = SQLiteFlowStore("/var/horizon/alpaca_flow.db")
audit_log = AuditLog(sink=SQLiteSink("/var/horizon/alpaca_audit.db"))

# 2. Engine: note venue_name="alpaca" so findings group correctly
engine = make_default_engine(
 venue_name="alpaca",
 store_path="/var/horizon/alpaca_flow.db",
 audit_log=audit_log,
)

# 3. The Alpaca feed: trades + quotes on your watchlist
feed = AlpacaLiveFeed(api_key="...", api_secret="...")
feed.subscribe(["AAPL", "MSFT", "NVDA", "TSLA"], SubscriptionKind.Trades)
feed.subscribe(["AAPL", "MSFT", "NVDA", "TSLA"], SubscriptionKind.Quotes)

# 4. Ingestion source: wraps the feed, emits actor_id=None events
source = AlpacaFlowSource(feed=feed, connect_feed=True)
source.on_event(engine.ingest)
source.connect()

# Done: findings flow into the store and audit log as trades + quotes arrive.

Note connect_feed=True: when the flow source is the ONLY consumer of the feed, let it manage the feed’s lifecycle. If you already have a trading pipeline that owns the feed, use connect_feed=False (the default). The source just adds itself as an extra on_tick handler.

Recipe: pseudo-actor reports

The CLI works against anon keys the same as named ones:

bash

horizon flow anomalies --db=/var/horizon/alpaca_flow.db \
 --market=AAPL --since-hours=24 --limit=100 | jq

# Dump the 5-min pseudo-profile for a specific window
horizon flow profile --db=/var/horizon/alpaca_flow.db \
 --venue=alpaca \
 --actor=anon_AAPL_5925487 | jq

To map a timestamp to the right window index:

python

import datetime

def anon_key(market_id: str, at: datetime.datetime, window_s: float = 300.0) -> str:
 return f"anon_{market_id}_{int(at.timestamp() // window_s)}"

# e.g., pseudo-actor covering AAPL at 14:32 UTC on 2026-05-01 with 5-min windows
key = anon_key("AAPL", datetime.datetime(2026, 5, 1, 14, 32, tzinfo=datetime.timezone.utc))
print(key) # anon_AAPL_5925486

Recipe: what DOES fire on anonymous tape

The detectors that fire without actor attribution:

Iceberg. Watches visible depth reload across the book. Works on Alpaca’s L2 snapshots as well as Polymarket’s CLOB. detectors/iceberg.
Wash trade. Flags round-number + Benford-weird size distribution on the market’s tape. Equity tape cleaner than crypto by default, so thresholds may need adjustment.
Quote stuffing. Counts placements + cancels per market per time window; fires on message-rate spikes with low fill ratio.
Market-level toxicity. VPIN / OFI / Hawkes on the tape + quote stream.

And the ones that silently skip:

Spoofing / Layering. Require same-actor placement+cancel pattern. No actor, no fire.
Momentum ignition. Requires same-actor aggressor+reversal. Skips.
Split order. Requires same-actor child-chain linking. Skips.

Recipe: market-regime filter from a pseudo-actor

The most useful v0.2 output is the Kirilenko distribution per market-window. Use it as a regime filter in a strategy:

python

import horizon as hz
from datetime import datetime, timezone


def market_regime(market_id: str, *, venue="alpaca",
 window_s: float = 300.0) -> dict[str, float]:
 """Return the taxonomy probs for the current 5-min window, or {} if
 the pseudo-actor hasn't accumulated enough events yet."""
 now = datetime.now(timezone.utc)
 window_idx = int(now.timestamp() // window_s)
 key = f"anon_{market_id}_{window_idx}"
 profile = hz.flow.actor_profile(key, venue=venue)
 return profile.taxonomy_probs if profile else {}


class RegimeAwareStrategy(hz.Strategy):
 def evaluate(self, f, universe):
 signals = []
 for market in universe:
 probs = market_regime(market.id)
 hft_pct = probs.get("hft", 0.0)
 if hft_pct > 0.5:
 # Majority-HFT regime → widen spreads, reduce size
 size_mult = 0.5
 else:
 size_mult = 1.0
 if f.z[market.id] < -2.0:
 signals.append(hz.Signal.increase(
 market, edge_bps=int(40 * size_mult), horizon="1h",
 ))
 return signals

The pseudo-actor smooths over 5 minutes of activity, so the regime reading is stable enough to act on (not flipping every tick) while still reflecting current conditions (not yesterday’s).

Recipe: compare toxicity between venues

Anonymous tape lets you compare markets ACROSS venues. Since the pseudo-actor model is symmetric, a VPIN / OFI / Hawkes reading on Alpaca AAPL is directly comparable to the same reading on IBKR AAPL or Hyperliquid BTC-PERP.

python

import horizon as hz
from datetime import datetime, timedelta, timezone

markets = [
 ("alpaca", "AAPL"),
 ("alpaca", "TSLA"),
 ("hyperliquid", "BTC"),
]
since = datetime.now(timezone.utc) - timedelta(hours=1)

for venue, market_id in markets:
 anomalies = hz.flow.anomalies(
 market_id=market_id, since=since, limit=100,
 )
 wash = sum(1 for a in anomalies if a.category.value == "wash_trade")
 stuff = sum(1 for a in anomalies if a.category.value == "quote_stuffing")
 print(f"{venue:12s} {market_id:6s} wash={wash:3d} stuff={stuff:3d}")

If you’re allocating across venues, lower toxicity = better execution for the same alpha.

Limitations

Some things v0.2 does NOT give you:

Broker-specific fill attribution. Alpaca reports “this trade happened on this exchange” in the tick, but not “my fills came from wallet X.” The venue layer knows that for your own orders; the flow layer can’t attribute other participants’ fills.
Dark-pool visibility. Tape doesn’t include dark-pool prints until they report (often delayed). Flow sees only what the SIP shows.
Options chain context. This release ships the ingestion for equity underlyings only. An options-aware extension (Greek context, IV regime) is v0.3+.
IBKR / CCXT sources. The pattern is the same as AlpacaFlowSource; those adapters follow in upcoming iterations.

Migrating from v0.1

No breaking changes. An existing v0.1 deployment picks up v0.2 behavior automatically. None-actor events that were previously skipped are now bucketed. If you prefer v0.1 semantics (skip anonymous events), set:

python

from dataclasses import replace
cfg = hz.flow.FlowConfig()
cfg.actors = replace(cfg.actors, anonymize_window_s=0.0)

Otherwise, expect your flow store to start accumulating anon_* profiles at a rate proportional to the volume of anonymous tape you’re consuming. Storage is bounded by (n_markets × duration / window_s), so 100 markets × 24 hours × 288 windows/day ≈ 29k profiles/day. Manageable for a year’s worth of data in a single SQLite file.

Defend recipes. Recipes 1, 3, 4 work on anonymous tape.
Alpha recipes. Recipes 1, 3, 5 work on anonymous tape.
Roadmap. What v0.2 ships vs v0.3 plans.
Actor profiling. Reference for the profile fields and taxonomy.

What you get and don’t get on anonymous tape

The synthetic pseudo-actor model

Recipe: live Alpaca surveillance

Recipe: pseudo-actor reports

Recipe: what DOES fire on anonymous tape

Recipe: market-regime filter from a pseudo-actor

Recipe: compare toxicity between venues

Limitations

Migrating from v0.1

Related