Elastic Net Selection

Lasso, ridge, and elastic net for feature selection and signal construction

When you have many candidate features and need to determine which ones actually predict returns, elastic net regression provides regularized feature selection. It produces sparse models where irrelevant features get zero weight.

API

Elastic net

python
model = hz.elastic_net_fit(X, y, alpha=0.1, l1_ratio=0.5)
# X: list of lists (n_samples x n_features)
# y: list of floats (n_samples)
# alpha: regularization strength (higher = more sparse)
# l1_ratio: 0.0 = pure ridge, 1.0 = pure lasso, 0.5 = balanced

print(model.coefficients)     # list of floats, one per feature
print(model.intercept)        # float
print(model.nonzero_count)    # number of features with nonzero weight
predictions = model.predict(X_new)

Lasso (L1 only)

python
model = hz.lasso_fit(X, y, alpha=0.1)
# Equivalent to elastic_net_fit with l1_ratio=1.0
# Produces the sparsest models -- aggressive feature elimination

Ridge (L2 only)

python
model = hz.ridge_fit(X, y, alpha=0.1)
# Equivalent to elastic_net_fit with l1_ratio=0.0
# Shrinks coefficients but doesn't zero them out
# Use when all features are relevant but you need regularization

Feature selection workflow

Find which features predict next-day returns from a pool of candidates:

python
import numpy as np

feature_names = ["momentum_20", "vol_60", "rsi_14", "spread_z",
                 "vpin", "flow_imbalance", "sentiment", "funding_rate"]

# X: historical feature values, y: next-day returns
model = hz.elastic_net_fit(X_train, y_train, alpha=0.05, l1_ratio=0.7)

# Which features survived?
selected = []
for name, coef in zip(feature_names, model.coefficients):
    if abs(coef) > 1e-8:
        selected.append((name, coef))
        print(f"  {name}: {coef:+.6f}")

print(f"\n{len(selected)} of {len(feature_names)} features selected")

Constructing a composite signal

Once you know which features matter, the fitted coefficients give you a linear signal:

python
model = hz.elastic_net_fit(X_train, y_train, alpha=0.05, l1_ratio=0.5)

# In your strategy's evaluate():
def evaluate(self, f, universe):
    for m in universe:
        features = [f.momentum[m.id], f.vol[m.id], f.rsi[m.id],
                    f.spread_z[m.id], f.vpin[m.id], f.flow[m.id],
                    f.sentiment[m.id], f.funding[m.id]]

        score = model.predict([features])[0]
        # score > 0: predicted positive return, score < 0: predicted negative

When to use

  • Feature discovery: you have 20+ candidate features and want to know which 5 actually matter.
  • Signal construction: combine multiple weak predictors into a single composite score.
  • Overfitting control: regularization prevents fitting to noise when your training sample is short.

Higher alpha means more aggressive pruning. Run with several values (0.001 to 1.0) and check out-of-sample performance to find the right tradeoff. For purely nonlinear relationships, tree-based methods will outperform, but for the linear component of return prediction this is a reliable starting point.

Next