Alerting

Alerter Protocol with Slack, email, PagerDuty adapters. Event-driven paging from the audit log.

The watchdog halts. The audit log records. Alerting is how a human finds out in time to act.

horizon.observability.alerts is the alerting layer. An Alerter Protocol plus a small set of adapters (Slack, email, PagerDuty). An AuditLogAlertBridge subscribes selected audit categories and fires alerts without the hot path knowing about it.

Protocol

python
class Alerter(Protocol):
    def send(
        self,
        *,
        severity: AuditSeverity,
        title: str,
        message: str,
        extra: dict[str, Any] | None = None,
    ) -> None: ...

Implementations must never raise. Alert loss is preferable to crashing the trading path. Network errors are caught and swallowed.

Adapters

AdapterChannelCreds
NullAlerterno-opnone
SlackAlerterIncoming Webhookslack.webhook via Secrets, or explicit webhook_url=
EmailAlerterSMTP (stdlib)smtp.username, smtp.password via Secrets
PagerDutyAlerterEvents API v2pagerduty.routing_key via Secrets
TwilioAlerterSMStwilio.account_sid, twilio.auth_token, twilio.from_number via Secrets
CompositeAlerterfan-outwraps a list of alerters

Custom adapters implement the Protocol. Wrap with CompositeAlerter to route critical events to PagerDuty and informational ones to Slack.

Quickstart

python
from horizon.observability import (
    AuditLogAlertBridge, CompositeAlerter,
    SlackAlerter, PagerDutyAlerter,
)
from horizon.audit import AuditSeverity
import horizon as hz

alerter = CompositeAlerter(alerters=[
    SlackAlerter(min_severity=AuditSeverity.Warning),     # webhook from env
    PagerDutyAlerter(min_severity=AuditSeverity.Critical),
])

hz.run(
    mode="live",
    feed=my_feed,
    audit_log=audit_log,
    alerter=alerter,                    # binds AuditLogAlertBridge when audit_log present
    ...,
)

Passing alerter= with an audit_log= auto-wires the bridge with the default category set. Every matching event fires alerter.send(...).

Default category set

DEFAULT_ALERT_CATEGORIES covers the events an advisor almost always wants to know about:

  • KillSwitchFired
  • WatchdogHalt
  • FeedStale and FeedGap
  • FeedDisconnected
  • ReconcileMismatch
  • OrderRejected
  • MarginCall

Minimum severity defaults to Warning. Everything below (Info, Debug, Notice) is dropped.

Custom routing

python
from horizon.audit import AuditCategory, AuditSeverity
from horizon.observability import AuditLogAlertBridge

bridge = AuditLogAlertBridge(
    alerter=alerter,
    categories=frozenset({
        AuditCategory.KillSwitchFired,
        AuditCategory.OrderRejected,
    }),
    min_severity=AuditSeverity.Critical,
    filter_fn=lambda e: (e.account_id or "") in {"acc_jane", "acc_bob"},
)
bridge.bind(audit_log)

filter_fn runs after category and severity filters, so it can narrow by account, venue, strategy, or anything on the AuditEvent.

Slack format

Each alert is one attachment:

  • Colored sidebar by severity (critical red, warning yellow, info green).
  • Title: [sev] <category>.
  • Body: the event’s message.
  • Fields: account_id, venue_name, market_id, order_id, client_order_id, correlation_id, truncated payload.
  • Footer: horizon / severity=<level>.

Attach a PII scrubber to the alerter and message bodies are scrubbed before posting:

python
from horizon.observability import PiiScrubber, SlackAlerter

alerter = SlackAlerter(
    webhook_url="https://hooks.slack.com/...",
    scrubber=PiiScrubber(),
    min_severity=AuditSeverity.Warning,
)

Email

python
from horizon.observability import EmailAlerter

alerter = EmailAlerter(
    smtp_host="smtp.example.com",
    smtp_port=587,
    from_addr="ops@example.com",
    to_addrs=["oncall@example.com", "compliance@example.com"],
    min_severity=AuditSeverity.Warning,
)

Credentials resolve via Secrets (smtp.username, smtp.password) unless passed explicitly.

PagerDuty

python
from horizon.observability import PagerDutyAlerter

pd = PagerDutyAlerter(min_severity=AuditSeverity.Critical)  # reads pagerduty.routing_key

Severity maps:

  • Info and Notice -> info.
  • Warning -> warning.
  • Critical -> critical.

PagerDuty incidents typically default to min_severity=Critical so low-severity events do not page oncall.

Observer failures are contained

python
class BadAlerter:
    def send(self, **_): raise RuntimeError("nope")

AuditLogAlertBridge(alerter=BadAlerter()).bind(audit_log)

# Audit write still succeeds. Observer exception is caught by AuditLog.
audit_log.record(AuditCategory.KillSwitchFired, message="halt")

The audit log’s subscribe/notify path catches observer exceptions. A broken alerter cannot break the write path.

When to use metrics vs. alerts

  • Metrics alerts (Prometheus rules): trend and SLO alerts. Error rate above 5% for 10 minutes. Submit latency p99 above 500ms. Feed heartbeat gauge stale for 30 seconds.
  • Audit-log alerts (this module): event-driven, fire immediately. Kill switch. Watchdog halt. Feed disconnect. Reconciliation mismatch.

Use both. See Metrics.

SMS via Twilio

python
from horizon.observability import TwilioAlerter

sms = TwilioAlerter(
    to_numbers=["+15551112222", "+15553334444"],
    # from_number / account_sid / auth_token all resolve from Secrets by default
    min_severity=AuditSeverity.Critical,   # SMS is high-signal; keep the bar high
)

Auth is HTTP Basic on account_sid + auth_token. The adapter POSTs to https://api.twilio.com/2010-04-01/Accounts/<SID>/Messages.json form-encoded. One HTTP POST per recipient; a failure on one does not block the rest.

Default min_severity=Critical so routine warnings do not wake the on-call.

Out of scope

  • Incident deduplication. PagerDuty handles dedup at the service level. Implement a rate-limit wrapper around the Alerter for the Slack path if needed.
  • On-call schedule. PagerDuty concern.