Tracing

OpenTelemetry spans across the submit path. NullTracer default; Otel on opt-in.

Tracing shows where time goes when an order is slow, or which gate rejected a specific action. horizon.observability.tracing provides a narrow Tracer Protocol, a NullTracer default, and OpenTelemetryTracer for production deployments.

Protocol

python
class Tracer(Protocol):
    def span(self, name: str, **attrs: Any) -> Any: ...

span(...) returns a context manager. Attributes are recorded on the span. Implementations never raise in steady state.

Null default

The default tracer is NullTracer. All Tier 1 to 3 code paths pay zero overhead when tracing is not configured. hz.run() uses NullTracer unless tracer= is passed.

OpenTelemetry

python
from horizon.observability import OpenTelemetryTracer
import horizon as hz

tracer = OpenTelemetryTracer(service_name="horizon-prod")
hz.run(mode="live", feed=..., tracer=tracer, ...)

OpenTelemetryTracer lazy-imports opentelemetry-api and opentelemetry-sdk. Install with:

pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp

It picks an exporter automatically:

  • opentelemetry-exporter-otlp installed: OTLP over gRPC to localhost:4317.
  • Otherwise: stdout console exporter.

Override with constructor arguments:

python
tracer = OpenTelemetryTracer(
    service_name="horizon-prod",
    exporter="otlp",
    otlp_endpoint="https://my-collector.example:4317",
)

Or pass a pre-built TracerProvider:

python
from opentelemetry.sdk.trace import TracerProvider
provider = TracerProvider(...)
tracer = OpenTelemetryTracer(tracer_provider=provider)

What spans open automatically

Span nameWhereAttributes
venue.submitAround every broker submit() callvenue, account, side, market_id, client_order_id

The submit span is the one most incident responders need. Every fill reconciliation, latency question, and broker reject traces back to it.

Manual spans

Strategies, features, and risk checks can open spans manually. The tracer is available wherever hz.run(...) is configured with one:

python
class MyStrategy(Strategy):
    def on_tick(self, ctx):
        with ctx.tracer.span("compute_signal", market_id=ctx.market_id):
            return self._compute(ctx)

Or from any module that holds a reference to the tracer:

python
with tracer.span("reconcile", venue="alpaca"):
    reconciler.reconcile()

Attribute hygiene

Keep attribute keys consistent. Standard names in the Horizon code:

  • venue (not venue_name)
  • account (not account_id)
  • market_id
  • side
  • client_order_id
  • correlation_id (threaded via horizon.observability.logging)

The OTel semantic conventions prefer dotted names (account.id). Pick one convention per deployment; the tracer stores whatever is passed.

Exception recording

When code inside a span raises, the tracer records the exception on the span and sets the status to ERROR before re-raising. This keeps observed latency accurate and surfaces failures in the trace UI without swallowing the original error.

Shutdown

The tracer exposes shutdown() for flushing buffered spans at process exit. When running under hz.run(mode="live"), call it in your stop handler:

python
try:
    hz.run(..., tracer=tracer, ...)
finally:
    tracer.shutdown()

The run loop does not call shutdown on a caller-provided tracer; that is the caller’s responsibility so a shared tracer survives across hz.run invocations.

Relationship to metrics and audit

  • Metrics count things (orders submitted, errors per minute) for dashboards and SLO alerts. See Metrics.
  • Audit events record what happened, in full, forever. See Audit trail.
  • Traces show where time goes within a single operation, for incident triage.

Use all three. They answer different questions.

Out of scope

  • Sampling config. Use the standard OTel OTEL_TRACES_SAMPLER env vars.
  • Propagation across processes. OTel’s context propagation (W3C TraceContext) works; the Horizon code does not strip it.
  • Metrics via OTel. Prometheus is the shipped metrics backend (see Metrics). You can layer OTel metrics on top if you prefer.
  • Hot-path span density. The submit path is instrumented. Adding spans inside every feature / strategy step is the caller’s call; the tracer makes it cheap when NullTracer is active.

Related