Metrics

MetricsCollector Protocol with a Prometheus exporter. Order, fill, risk, feed, and equity counters and gauges.

horizon.observability.metrics is the operational visibility layer. It emits counters, gauges, and histograms that dashboards and alert rules read. Nothing in Tier 1 to 3 changes: the default collector is NullMetrics (no-op). Pro users opt in by passing telemetry=PrometheusMetrics() to hz.run().

Protocol

python
class MetricsCollector(Protocol):
    def inc(self, name: str, value: float = 1.0, **labels: str | None) -> None: ...
    def gauge(self, name: str, value: float, **labels: str | None) -> None: ...
    def observe(self, name: str, value: float, **labels: str | None) -> None: ...

Three backends ship:

  • NullMetrics. Default. No-op.
  • PrometheusMetrics. Lazy-imports prometheus_client. Exposes /metrics.
  • Custom. Implement the Protocol to route to StatsD, CloudWatch, OTel, or a test stub.

Standard metrics

All names are stable. Dashboards and alert rules reference them.

NameTypeLabelsMeaning
horizon_orders_submitted_totalcountervenue, account, sideSubmits that reached the venue.
horizon_order_rejects_totalcountervenue, account, layerRejections. layer is the risk layer that fired, or venue_exception.
horizon_order_latency_secondshistogramvenue, accountVenue submit-to-return latency.
horizon_fills_totalcountervenue, account, sideFills received.
horizon_audit_events_totalcountercategory, severityEvery audit event.
horizon_feed_heartbeat_age_secondsgaugefeedSeconds since last tick.
horizon_feed_gaps_totalcounterfeed, marketSequence gaps observed.
horizon_watchdog_halts_totalcounterreasonWatchdog halts.
horizon_risk_decisions_totalcounterlayer, kindPass / reject / resize by layer.
horizon_positions_opengaugevenue, accountOpen position count.
horizon_equity_usdgaugeaccountAccount equity.
horizon_dlq_depthgaugevenueDead-letter queue depth.

Quickstart

python
from horizon.observability import PrometheusMetrics
import horizon as hz

metrics = PrometheusMetrics()
metrics.serve(port=9100)          # /metrics on :9100

hz.run(
    mode="live",
    feed=my_feed,
    venues={"alpaca": venue},
    accounts=registry,
    audit_log=audit_log,
    telemetry=metrics,             # no-op by default; opt in here
    watchdog=LiveWatchdogConfig(...),
    ...,
)

When audit_log= and telemetry= are both set, hz.run() automatically:

  • Subscribes a metrics observer to the audit log. Every AuditEvent increments horizon_audit_events_total. WatchdogHalt, RiskDecision, and FeedGap increment their specific counters.
  • Increments horizon_orders_submitted_total and horizon_order_rejects_total around each venue submit.
  • Observes horizon_order_latency_seconds per submit.

HTTP exporter

PrometheusMetrics.serve(host="0.0.0.0", port=9100) starts an embedded http.server on a daemon thread. The endpoint is /metrics. No web framework is required.

$ curl http://localhost:9100/metrics
# HELP horizon_orders_submitted_total Orders submitted to a venue.
# TYPE horizon_orders_submitted_total counter
horizon_orders_submitted_total{account="acc_1",side="buy",venue="alpaca"} 42.0
# HELP horizon_equity_usd Total account equity in USD.
# TYPE horizon_equity_usd gauge
horizon_equity_usd{account="acc_1"} 102345.5
...

Close with metrics.close() at shutdown. close() is idempotent.

Wiring into an existing exporter

If a process already exposes /metrics via FastAPI, Flask, or the default prometheus_client registry, bypass the built-in server:

python
from prometheus_client import generate_latest

@app.get("/metrics")
def _metrics():
    return Response(generate_latest(metrics.registry),
                    media_type="text/plain; version=0.0.4")

PrometheusMetrics uses its own CollectorRegistry, so the Horizon metric set is isolated from everything else.

Manual instrumentation

For custom strategy code:

python
from horizon.observability import MetricName, OrderLatencyTimer

metrics.gauge(MetricName.EquityUsd, ledger.equity(), account="acc_1")

with OrderLatencyTimer(metrics, venue="alpaca", account="acc_1"):
    venue.submit(action)

OrderLatencyTimer is a context manager that records to horizon_order_latency_seconds on exit.

Alerting on metrics

Metrics are numbers. Alerts are decisions. Two paths:

  1. Prometheus alert rules in alertmanager. Same patterns any Prometheus shop uses.
  2. Audit-log alerter. Event-driven, faster, no scrape latency. Recommended for kill-switch and watchdog-halt.

Use both. Metrics for trend and SLO-style alerts (error-rate > 5% over 10 minutes). Audit alerts for things that must page immediately (kill switch fired).

PII

The default labels carry account ids. Account ids are non-PII handles (Accounts); the Metrics Protocol does not accept client names or numbers. Do not route client PII through labels.

Out of scope

  • Tracing. OpenTelemetry spans across feed, strategy, risk, venue. L2.
  • StatsD / CloudWatch backends. Implement MetricsCollector and route.
  • Cardinality limits. Prometheus behaves badly when labels explode. Account, venue, and side are finite. Do not add freeform labels.