Metrics
MetricsCollector Protocol with a Prometheus exporter. Order, fill, risk, feed, and equity counters and gauges.
horizon.observability.metrics is the operational visibility layer. It emits counters, gauges, and histograms that dashboards and alert rules read. Nothing in Tier 1 to 3 changes: the default collector is NullMetrics (no-op). Pro users opt in by passing telemetry=PrometheusMetrics() to hz.run().
Protocol
class MetricsCollector(Protocol):
def inc(self, name: str, value: float = 1.0, **labels: str | None) -> None: ...
def gauge(self, name: str, value: float, **labels: str | None) -> None: ...
def observe(self, name: str, value: float, **labels: str | None) -> None: ...
Three backends ship:
NullMetrics. Default. No-op.PrometheusMetrics. Lazy-importsprometheus_client. Exposes/metrics.- Custom. Implement the Protocol to route to StatsD, CloudWatch, OTel, or a test stub.
Standard metrics
All names are stable. Dashboards and alert rules reference them.
| Name | Type | Labels | Meaning |
|---|---|---|---|
horizon_orders_submitted_total | counter | venue, account, side | Submits that reached the venue. |
horizon_order_rejects_total | counter | venue, account, layer | Rejections. layer is the risk layer that fired, or venue_exception. |
horizon_order_latency_seconds | histogram | venue, account | Venue submit-to-return latency. |
horizon_fills_total | counter | venue, account, side | Fills received. |
horizon_audit_events_total | counter | category, severity | Every audit event. |
horizon_feed_heartbeat_age_seconds | gauge | feed | Seconds since last tick. |
horizon_feed_gaps_total | counter | feed, market | Sequence gaps observed. |
horizon_watchdog_halts_total | counter | reason | Watchdog halts. |
horizon_risk_decisions_total | counter | layer, kind | Pass / reject / resize by layer. |
horizon_positions_open | gauge | venue, account | Open position count. |
horizon_equity_usd | gauge | account | Account equity. |
horizon_dlq_depth | gauge | venue | Dead-letter queue depth. |
Quickstart
from horizon.observability import PrometheusMetrics
import horizon as hz
metrics = PrometheusMetrics()
metrics.serve(port=9100) # /metrics on :9100
hz.run(
mode="live",
feed=my_feed,
venues={"alpaca": venue},
accounts=registry,
audit_log=audit_log,
telemetry=metrics, # no-op by default; opt in here
watchdog=LiveWatchdogConfig(...),
...,
)
When audit_log= and telemetry= are both set, hz.run() automatically:
- Subscribes a metrics observer to the audit log. Every
AuditEventincrementshorizon_audit_events_total.WatchdogHalt,RiskDecision, andFeedGapincrement their specific counters. - Increments
horizon_orders_submitted_totalandhorizon_order_rejects_totalaround each venue submit. - Observes
horizon_order_latency_secondsper submit.
HTTP exporter
PrometheusMetrics.serve(host="0.0.0.0", port=9100) starts an embedded http.server on a daemon thread. The endpoint is /metrics. No web framework is required.
$ curl http://localhost:9100/metrics
# HELP horizon_orders_submitted_total Orders submitted to a venue.
# TYPE horizon_orders_submitted_total counter
horizon_orders_submitted_total{account="acc_1",side="buy",venue="alpaca"} 42.0
# HELP horizon_equity_usd Total account equity in USD.
# TYPE horizon_equity_usd gauge
horizon_equity_usd{account="acc_1"} 102345.5
...
Close with metrics.close() at shutdown. close() is idempotent.
Wiring into an existing exporter
If a process already exposes /metrics via FastAPI, Flask, or the default prometheus_client registry, bypass the built-in server:
from prometheus_client import generate_latest
@app.get("/metrics")
def _metrics():
return Response(generate_latest(metrics.registry),
media_type="text/plain; version=0.0.4")
PrometheusMetrics uses its own CollectorRegistry, so the Horizon metric set is isolated from everything else.
Manual instrumentation
For custom strategy code:
from horizon.observability import MetricName, OrderLatencyTimer
metrics.gauge(MetricName.EquityUsd, ledger.equity(), account="acc_1")
with OrderLatencyTimer(metrics, venue="alpaca", account="acc_1"):
venue.submit(action)
OrderLatencyTimer is a context manager that records to horizon_order_latency_seconds on exit.
Alerting on metrics
Metrics are numbers. Alerts are decisions. Two paths:
- Prometheus alert rules in
alertmanager. Same patterns any Prometheus shop uses. - Audit-log alerter. Event-driven, faster, no scrape latency. Recommended for kill-switch and watchdog-halt.
Use both. Metrics for trend and SLO-style alerts (error-rate > 5% over 10 minutes). Audit alerts for things that must page immediately (kill switch fired).
PII
The default labels carry account ids. Account ids are non-PII handles (Accounts); the Metrics Protocol does not accept client names or numbers. Do not route client PII through labels.
Out of scope
- Tracing. OpenTelemetry spans across feed, strategy, risk, venue. L2.
- StatsD / CloudWatch backends. Implement
MetricsCollectorand route. - Cardinality limits. Prometheus behaves badly when labels explode. Account, venue, and side are finite. Do not add freeform labels.