Bars & Labeling

Information-driven bars, triple barrier labeling, and CUSUM filter from AFML

Standard time bars (1-min, 5-min) sample uniformly regardless of activity. During quiet periods you get noise; during volatile periods you undersample. Information-driven bars sample based on market activity — ticks, volume, or dollar flow — producing observations with more uniform information content. Combined with triple barrier labeling and CUSUM event filtering, these form the data preparation layer from de Prado’s Advances in Financial Machine Learning.

API

Bar construction

python
# Fixed-count bars
tick_bars = hz.tick_bars(trades, threshold=100)        # new bar every 100 ticks
volume_bars = hz.volume_bars(trades, threshold=50000)  # new bar every 50k volume
dollar_bars = hz.dollar_bars(trades, threshold=1e6)    # new bar every $1M traded

# Imbalance bars (adaptive threshold)
tib = hz.tick_imbalance_bars(trades, expected_imbalance=50)
vib = hz.volume_imbalance_bars(trades, expected_imbalance=25000)

Each returns a list of Bar objects:

python
bar.open       # float
bar.high       # float
bar.low        # float
bar.close      # float
bar.volume     # float
bar.n_ticks    # int
bar.timestamp  # float (epoch seconds, bar close time)

Triple barrier labeling

python
labels = hz.triple_barrier_label(
    prices=[100.0, 101.2, 99.5, 102.1, ...],
    pt=0.02,               # profit-taking barrier: +2%
    sl=0.01,               # stop-loss barrier: -1%
    max_holding=10,         # maximum holding period (bars)
)
# labels: list of int -- 1 (hit upper), -1 (hit lower), 0 (expired)

CUSUM event filter

python
events = hz.cusum_filter(prices, threshold=0.03)
# events: list of int -- indices where cumulative deviation exceeds threshold

From raw trades to labeled samples

The typical AFML pipeline: raw trades -> information bars -> CUSUM events -> triple barrier labels.

python
# 1. Build dollar bars from raw trade data
bars = hz.dollar_bars(trades, threshold=500_000)
prices = [b.close for b in bars]

# 2. Filter for meaningful events using CUSUM
events = hz.cusum_filter(prices, threshold=0.02)

# 3. Label each event with triple barrier
event_prices = [prices[i:i+50] for i in events]
labels = []
for segment in event_prices:
    l = hz.triple_barrier_label(segment, pt=0.03, sl=0.015, max_holding=20)
    labels.append(l[0])

# 4. Use labels for supervised learning or meta-labeling

Choosing bar types

Bar typeSamples onBest for
Tick barsTrade countHFT, microstructure analysis
Volume barsShares/contractsEquity strategies, position-aware signals
Dollar barsNotional valueCross-asset comparison (normalizes for price level)
Tick imbalanceBuy/sell imbalanceDetecting informed flow
Volume imbalanceDirectional volumeOrder flow toxicity signals

When to use

  • ML pipelines: replace time bars with information-driven bars for more stationary features.
  • Event detection: CUSUM filter identifies structural breaks without a fixed lookback window.
  • Labeling: triple barrier labels encode profit-taking, stop-loss, and time decay into a single target variable for supervised learning.

Next