Alerting
Alerter Protocol with Slack, email, PagerDuty adapters. Event-driven paging from the audit log.
The watchdog halts. The audit log records. Alerting is how a human finds out in time to act.
horizon.observability.alerts is the alerting layer. An Alerter Protocol plus a small set of adapters (Slack, email, PagerDuty). An AuditLogAlertBridge subscribes selected audit categories and fires alerts without the hot path knowing about it.
Protocol
class Alerter(Protocol):
def send(
self,
*,
severity: AuditSeverity,
title: str,
message: str,
extra: dict[str, Any] | None = None,
) -> None: ...
Implementations must never raise. Alert loss is preferable to crashing the trading path. Network errors are caught and swallowed.
Adapters
| Adapter | Channel | Creds |
|---|---|---|
NullAlerter | no-op | none |
SlackAlerter | Incoming Webhook | slack.webhook via Secrets, or explicit webhook_url= |
EmailAlerter | SMTP (stdlib) | smtp.username, smtp.password via Secrets |
PagerDutyAlerter | Events API v2 | pagerduty.routing_key via Secrets |
TwilioAlerter | SMS | twilio.account_sid, twilio.auth_token, twilio.from_number via Secrets |
CompositeAlerter | fan-out | wraps a list of alerters |
Custom adapters implement the Protocol. Wrap with CompositeAlerter to route critical events to PagerDuty and informational ones to Slack.
Quickstart
from horizon.observability import (
AuditLogAlertBridge, CompositeAlerter,
SlackAlerter, PagerDutyAlerter,
)
from horizon.audit import AuditSeverity
import horizon as hz
alerter = CompositeAlerter(alerters=[
SlackAlerter(min_severity=AuditSeverity.Warning), # webhook from env
PagerDutyAlerter(min_severity=AuditSeverity.Critical),
])
hz.run(
mode="live",
feed=my_feed,
audit_log=audit_log,
alerter=alerter, # binds AuditLogAlertBridge when audit_log present
...,
)
Passing alerter= with an audit_log= auto-wires the bridge with the default category set. Every matching event fires alerter.send(...).
Default category set
DEFAULT_ALERT_CATEGORIES covers the events an advisor almost always wants to know about:
KillSwitchFiredWatchdogHaltFeedStaleandFeedGapFeedDisconnectedReconcileMismatchOrderRejectedMarginCall
Minimum severity defaults to Warning. Everything below (Info, Debug, Notice) is dropped.
Custom routing
from horizon.audit import AuditCategory, AuditSeverity
from horizon.observability import AuditLogAlertBridge
bridge = AuditLogAlertBridge(
alerter=alerter,
categories=frozenset({
AuditCategory.KillSwitchFired,
AuditCategory.OrderRejected,
}),
min_severity=AuditSeverity.Critical,
filter_fn=lambda e: (e.account_id or "") in {"acc_jane", "acc_bob"},
)
bridge.bind(audit_log)
filter_fn runs after category and severity filters, so it can narrow by account, venue, strategy, or anything on the AuditEvent.
Slack format
Each alert is one attachment:
- Colored sidebar by severity (critical red, warning yellow, info green).
- Title:
[sev] <category>. - Body: the event’s
message. - Fields:
account_id,venue_name,market_id,order_id,client_order_id,correlation_id, truncatedpayload. - Footer:
horizon / severity=<level>.
Attach a PII scrubber to the alerter and message bodies are scrubbed before posting:
from horizon.observability import PiiScrubber, SlackAlerter
alerter = SlackAlerter(
webhook_url="https://hooks.slack.com/...",
scrubber=PiiScrubber(),
min_severity=AuditSeverity.Warning,
)
from horizon.observability import EmailAlerter
alerter = EmailAlerter(
smtp_host="smtp.example.com",
smtp_port=587,
from_addr="ops@example.com",
to_addrs=["oncall@example.com", "compliance@example.com"],
min_severity=AuditSeverity.Warning,
)
Credentials resolve via Secrets (smtp.username, smtp.password) unless passed explicitly.
PagerDuty
from horizon.observability import PagerDutyAlerter
pd = PagerDutyAlerter(min_severity=AuditSeverity.Critical) # reads pagerduty.routing_key
Severity maps:
InfoandNotice->info.Warning->warning.Critical->critical.
PagerDuty incidents typically default to min_severity=Critical so low-severity events do not page oncall.
Observer failures are contained
class BadAlerter:
def send(self, **_): raise RuntimeError("nope")
AuditLogAlertBridge(alerter=BadAlerter()).bind(audit_log)
# Audit write still succeeds. Observer exception is caught by AuditLog.
audit_log.record(AuditCategory.KillSwitchFired, message="halt")
The audit log’s subscribe/notify path catches observer exceptions. A broken alerter cannot break the write path.
When to use metrics vs. alerts
- Metrics alerts (Prometheus rules): trend and SLO alerts. Error rate above 5% for 10 minutes. Submit latency p99 above 500ms. Feed heartbeat gauge stale for 30 seconds.
- Audit-log alerts (this module): event-driven, fire immediately. Kill switch. Watchdog halt. Feed disconnect. Reconciliation mismatch.
Use both. See Metrics.
SMS via Twilio
from horizon.observability import TwilioAlerter
sms = TwilioAlerter(
to_numbers=["+15551112222", "+15553334444"],
# from_number / account_sid / auth_token all resolve from Secrets by default
min_severity=AuditSeverity.Critical, # SMS is high-signal; keep the bar high
)
Auth is HTTP Basic on account_sid + auth_token. The adapter POSTs to https://api.twilio.com/2010-04-01/Accounts/<SID>/Messages.json form-encoded. One HTTP POST per recipient; a failure on one does not block the rest.
Default min_severity=Critical so routine warnings do not wake the on-call.
Out of scope
- Incident deduplication. PagerDuty handles dedup at the service level. Implement a rate-limit wrapper around the Alerter for the Slack path if needed.
- On-call schedule. PagerDuty concern.