Metrics

MetricsCollector Protocol with a Prometheus exporter. Order, fill, risk, feed, and equity counters and gauges.

horizon.observability.metrics is the operational visibility layer. It emits counters, gauges, and histograms that dashboards and alert rules read. Nothing in Tier 1 to 3 changes: the default collector is NullMetrics (no-op). Pro users opt in by passing telemetry=PrometheusMetrics() to hz.run().

Protocol

python

class MetricsCollector(Protocol):
    def inc(self, name: str, value: float = 1.0, **labels: str | None) -> None: ...
    def gauge(self, name: str, value: float, **labels: str | None) -> None: ...
    def observe(self, name: str, value: float, **labels: str | None) -> None: ...

Three backends ship:

NullMetrics. Default. No-op.
PrometheusMetrics. Lazy-imports prometheus_client. Exposes /metrics.
Custom. Implement the Protocol to route to StatsD, CloudWatch, OTel, or a test stub.

Standard metrics

All names are stable. Dashboards and alert rules reference them.

Name	Type	Labels	Meaning
`horizon_orders_submitted_total`	counter	venue, account, side	Submits that reached the venue.
`horizon_order_rejects_total`	counter	venue, account, layer	Rejections. `layer` is the risk layer that fired, or `venue_exception`.
`horizon_order_latency_seconds`	histogram	venue, account	Venue submit-to-return latency.
`horizon_fills_total`	counter	venue, account, side	Fills received.
`horizon_audit_events_total`	counter	category, severity	Every audit event.
`horizon_feed_heartbeat_age_seconds`	gauge	feed	Seconds since last tick.
`horizon_feed_gaps_total`	counter	feed, market	Sequence gaps observed.
`horizon_watchdog_halts_total`	counter	reason	Watchdog halts.
`horizon_risk_decisions_total`	counter	layer, kind	Pass / reject / resize by layer.
`horizon_positions_open`	gauge	venue, account	Open position count.
`horizon_equity_usd`	gauge	account	Account equity.
`horizon_dlq_depth`	gauge	venue	Dead-letter queue depth.

Quickstart

python

from horizon.observability import PrometheusMetrics
import horizon as hz

metrics = PrometheusMetrics()
metrics.serve(port=9100)          # /metrics on :9100

hz.run(
    mode="live",
    feed=my_feed,
    venues={"alpaca": venue},
    accounts=registry,
    audit_log=audit_log,
    telemetry=metrics,             # no-op by default; opt in here
    watchdog=LiveWatchdogConfig(...),
    ...,
)

When audit_log= and telemetry= are both set, hz.run() automatically:

Subscribes a metrics observer to the audit log. Every AuditEvent increments horizon_audit_events_total. WatchdogHalt, RiskDecision, and FeedGap increment their specific counters.
Increments horizon_orders_submitted_total and horizon_order_rejects_total around each venue submit.
Observes horizon_order_latency_seconds per submit.

HTTP exporter

PrometheusMetrics.serve(host="0.0.0.0", port=9100) starts an embedded http.server on a daemon thread. The endpoint is /metrics. No web framework is required.

$ curl http://localhost:9100/metrics
# HELP horizon_orders_submitted_total Orders submitted to a venue.
# TYPE horizon_orders_submitted_total counter
horizon_orders_submitted_total{account="acc_1",side="buy",venue="alpaca"} 42.0
# HELP horizon_equity_usd Total account equity in USD.
# TYPE horizon_equity_usd gauge
horizon_equity_usd{account="acc_1"} 102345.5
...

Close with metrics.close() at shutdown. close() is idempotent.

Wiring into an existing exporter

If a process already exposes /metrics via FastAPI, Flask, or the default prometheus_client registry, bypass the built-in server:

python

from prometheus_client import generate_latest

@app.get("/metrics")
def _metrics():
    return Response(generate_latest(metrics.registry),
                    media_type="text/plain; version=0.0.4")

PrometheusMetrics uses its own CollectorRegistry, so the Horizon metric set is isolated from everything else.

Manual instrumentation

For custom strategy code:

python

from horizon.observability import MetricName, OrderLatencyTimer

metrics.gauge(MetricName.EquityUsd, ledger.equity(), account="acc_1")

with OrderLatencyTimer(metrics, venue="alpaca", account="acc_1"):
    venue.submit(action)

OrderLatencyTimer is a context manager that records to horizon_order_latency_seconds on exit.

Alerting on metrics

Metrics are numbers. Alerts are decisions. Two paths:

Prometheus alert rules in alertmanager. Same patterns any Prometheus shop uses.
Audit-log alerter. Event-driven, faster, no scrape latency. Recommended for kill-switch and watchdog-halt.

Use both. Metrics for trend and SLO-style alerts (error-rate > 5% over 10 minutes). Audit alerts for things that must page immediately (kill switch fired).

PII

The default labels carry account ids. Account ids are non-PII handles (Accounts); the Metrics Protocol does not accept client names or numbers. Do not route client PII through labels.

Out of scope

Tracing. OpenTelemetry spans across feed, strategy, risk, venue. L2.
StatsD / CloudWatch backends. Implement MetricsCollector and route.
Cardinality limits. Prometheus behaves badly when labels explode. Account, venue, and side are finite. Do not add freeform labels.