de Prado Labeling

Triple-barrier labeling, meta-labeling, sample weights

One of the biggest contributions in Advances in Financial Machine Learning is a set of labeling techniques for supervised learning on financial time series. The naive approach (“label +1 if price went up tomorrow”) systematically fails. de Prado’s methods address the structural issues.

The problem with fixed-horizon labels

Consider labeling “will this stock be up in 5 days”:

t:   0   1   2   3   4   5   6   7
p:  100 101  99 103 102 100  98  95
y5:  ?   ?   ?   ?   ?   ?   ?   ?

The label at t=0 compares p(5) = 100 to p(0) = 100 → label = 0 (flat). But at t=0, during the 5-day window, the price went as high as 103 (profit!) and as low as 95 (stop loss!). The fixed-horizon label ignores what actually happened.

The fix: the triple-barrier method.

Triple-barrier labeling

Define three barriers

- Upper barrier: profit-take (e.g., +2% from entry) - Lower barrier: stop-loss (e.g., −1% from entry) - Time barrier: vertical (e.g., 5 days)

Assign label by which barrier is hit first

- Upper first → +1 (profit) - Lower first → −1 (loss) - Time expires → 0 (flat)

Compute meta information

Record which barrier was hit, the price path's touched extremes, and the realized return.

Why it’s better

The triple-barrier label respects the trading setup: you’d close the position at the first barrier hit, not ride it to the arbitrary horizon. This makes the label economically meaningful and well-aligned with the actual strategy logic.

Triple-barrier recipe

python
import pandas as pd
import numpy as np

def triple_barrier_labels(
    prices: pd.Series,
    upper_pct: float = 0.02,
    lower_pct: float = 0.01,
    horizon_bars: int = 5,
) -> pd.DataFrame:
    """Triple-barrier labels for every entry point.

    Returns a DataFrame with columns:
        label (+1 / -1 / 0),
        barrier_hit ('upper' / 'lower' / 'time'),
        bars_to_hit,
        realized_return
    """
    out = []
    for i in range(len(prices) - horizon_bars):
        entry = prices.iloc[i]
        upper = entry * (1 + upper_pct)
        lower = entry * (1 - lower_pct)

        label = 0
        barrier = "time"
        hit_bar = horizon_bars

        for j in range(1, horizon_bars + 1):
            p = prices.iloc[i + j]
            if p >= upper:
                label = 1
                barrier = "upper"
                hit_bar = j
                break
            if p <= lower:
                label = -1
                barrier = "lower"
                hit_bar = j
                break

        realized = (prices.iloc[i + hit_bar] / entry - 1)
        out.append({
            "entry_idx": i,
            "label": label,
            "barrier_hit": barrier,
            "bars_to_hit": hit_bar,
            "realized_return": realized,
        })

    return pd.DataFrame(out)

Use it to build a supervised dataset:

python
prices = pd.Series(my_price_data)
labels = triple_barrier_labels(prices, upper_pct=0.02, lower_pct=0.01, horizon_bars=5)
features = compute_features_at_entry_points(labels["entry_idx"])

# Now train a classifier
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()
clf.fit(features, labels["label"])

Meta-labeling

The idea: instead of training a model to predict direction, train a primary model to predict direction (possibly naive), then train a meta model to decide when to trust the primary model.

Train a primary model

Simple directional model (e.g., momentum, MA cross). It produces a directional signal on every bar.

Apply triple-barrier labeling to the primary's signals

For each signal the primary emits, compute whether the triple-barrier outcome was a win (+1) or loss (0).

Train a secondary (meta) model

The meta model takes features + the primary's signal direction and predicts whether to trust the primary on this specific trade.

Deploy both together

Primary predicts direction. Meta predicts confidence. Only trade when the meta says "trust this one".

Why it works

  • The primary can be simple (e.g., a pure trend follower); it doesn’t need to know when to sit out
  • The meta focuses on a very specific binary task: “given this signal, should I trade?”
  • Splitting reduces the hypothesis space each model has to cover and typically improves out-of-sample behavior

Recipe

python
# Step 1: primary signals from a simple strategy
from horizon.quant import TSMomentum
primary = TSMomentum(lookback=20)
primary_signals = run_primary_on_history(primary, ...)

# Step 2: triple-barrier labels for each primary signal
from my_utils import triple_barrier_labels
tb_labels = triple_barrier_labels(
    prices=history,
    entry_points=primary_signals.timestamps,
    upper_pct=0.02,
    lower_pct=0.01,
    horizon_bars=5,
)
# Meta label: 1 if primary's trade was profitable, 0 otherwise
meta_y = (tb_labels["label"] == primary_signals.direction).astype(int)

# Step 3: train a meta classifier
from sklearn.ensemble import RandomForestClassifier
meta_features = compute_features_at(primary_signals.timestamps)   # could be different features
meta_clf = RandomForestClassifier(n_estimators=200, max_depth=5)
meta_clf.fit(meta_features, meta_y)

# Step 4: combined strategy
class MetaLabeled(Strategy):
    asset_classes = [Equity]
    features = {...}

    def evaluate(self, f, universe):
        primary_signals = primary.evaluate(f, universe)
        result = []
        for sig in primary_signals:
            meta_features = self._build_meta_features(sig, f)
            meta_confidence = meta_clf.predict_proba(meta_features)[0][1]
            if meta_confidence > 0.6:
                result.append(replace(sig, confidence=meta_confidence))
        return result

Sample weights

de Prado argues that samples with overlapping labels are correlated and should be down-weighted. Consider two entries one day apart in a 5-day horizon: their labels share 4 days of price data.

python
def sample_weights_by_overlap(
    label_times: list[datetime],
    label_durations: list[int],
) -> list[float]:
    """Compute weights inversely proportional to label overlap."""
    import numpy as np

    n = len(label_times)
    weights = np.zeros(n)

    for i in range(n):
        overlap_count = 0
        for j in range(n):
            # Does label j overlap with label i in time?
            i_end = label_times[i] + timedelta(days=label_durations[i])
            j_end = label_times[j] + timedelta(days=label_durations[j])
            if (label_times[j] <= i_end and j_end >= label_times[i]):
                overlap_count += 1
        weights[i] = 1.0 / overlap_count if overlap_count > 0 else 1.0

    return weights / weights.sum() * n   # normalize so mean weight = 1

Pass these to your classifier’s sample_weight parameter:

python
clf.fit(X, y, sample_weight=sample_weights_by_overlap(...))

When to use

ML-based strategies Any strategy that trains a classifier on historical data. The triple-barrier method produces much cleaner labels than naive fixed-horizon.
Combining rule-based + ML Meta-labeling is especially powerful when you have a simple rule-based primary and want to add ML-based conviction filtering on top.
Event-driven labels Any time the outcome depends on "what happened first" rather than "what happened at horizon T".

Pitfalls

Status in Horizon

Source references

  • python/horizon/fund/_backtest_runner.py has rudimentary per-trade outcome tracking you can repurpose
  • horizon/state/ledger.py records realized P&L per trade. feed that into your labeler
  • horizon/data/synthetic.py is useful for testing labelers against known-truth data

Next