Bars & Labeling
Information-driven bars, triple barrier labeling, and CUSUM filter from AFML
Standard time bars (1-min, 5-min) sample uniformly regardless of activity. During quiet periods you get noise; during volatile periods you undersample. Information-driven bars sample based on market activity — ticks, volume, or dollar flow — producing observations with more uniform information content. Combined with triple barrier labeling and CUSUM event filtering, these form the data preparation layer from de Prado’s Advances in Financial Machine Learning.
API
Bar construction
python
# Fixed-count bars
tick_bars = hz.tick_bars(trades, threshold=100) # new bar every 100 ticks
volume_bars = hz.volume_bars(trades, threshold=50000) # new bar every 50k volume
dollar_bars = hz.dollar_bars(trades, threshold=1e6) # new bar every $1M traded
# Imbalance bars (adaptive threshold)
tib = hz.tick_imbalance_bars(trades, expected_imbalance=50)
vib = hz.volume_imbalance_bars(trades, expected_imbalance=25000)
Each returns a list of Bar objects:
python
bar.open # float
bar.high # float
bar.low # float
bar.close # float
bar.volume # float
bar.n_ticks # int
bar.timestamp # float (epoch seconds, bar close time)
Triple barrier labeling
python
labels = hz.triple_barrier_label(
prices=[100.0, 101.2, 99.5, 102.1, ...],
pt=0.02, # profit-taking barrier: +2%
sl=0.01, # stop-loss barrier: -1%
max_holding=10, # maximum holding period (bars)
)
# labels: list of int -- 1 (hit upper), -1 (hit lower), 0 (expired)
CUSUM event filter
python
events = hz.cusum_filter(prices, threshold=0.03)
# events: list of int -- indices where cumulative deviation exceeds threshold
From raw trades to labeled samples
The typical AFML pipeline: raw trades -> information bars -> CUSUM events -> triple barrier labels.
python
# 1. Build dollar bars from raw trade data
bars = hz.dollar_bars(trades, threshold=500_000)
prices = [b.close for b in bars]
# 2. Filter for meaningful events using CUSUM
events = hz.cusum_filter(prices, threshold=0.02)
# 3. Label each event with triple barrier
event_prices = [prices[i:i+50] for i in events]
labels = []
for segment in event_prices:
l = hz.triple_barrier_label(segment, pt=0.03, sl=0.015, max_holding=20)
labels.append(l[0])
# 4. Use labels for supervised learning or meta-labeling
Choosing bar types
| Bar type | Samples on | Best for |
|---|---|---|
| Tick bars | Trade count | HFT, microstructure analysis |
| Volume bars | Shares/contracts | Equity strategies, position-aware signals |
| Dollar bars | Notional value | Cross-asset comparison (normalizes for price level) |
| Tick imbalance | Buy/sell imbalance | Detecting informed flow |
| Volume imbalance | Directional volume | Order flow toxicity signals |
When to use
- ML pipelines: replace time bars with information-driven bars for more stationary features.
- Event detection: CUSUM filter identifies structural breaks without a fixed lookback window.
- Labeling: triple barrier labels encode profit-taking, stop-loss, and time decay into a single target variable for supervised learning.