Dead-letter queue
Capture failed order submissions for inspection and replay instead of losing them.
When a venue submit() raises (network blip, expired credentials, a 4xx the broker returned), the run loop catches the exception and continues. That is safe for the loop. The order itself is lost unless a DLQ catches it.
horizon.ops.dlq is the dead-letter queue: a sink that records the original OrderAction plus context (error, timestamp, venue, account, retry count) for an operator to inspect, replay, or dismiss.
Protocol
class DLQSink(Protocol):
def write(self, entry: DeadLetteredOrder) -> None: ...
def list(self, *, include_dismissed: bool = False) -> list[DeadLetteredOrder]: ...
def get(self, dlq_id: str) -> DeadLetteredOrder | None: ...
def mark_dismissed(self, dlq_id: str) -> bool: ...
def bump_retry(self, dlq_id: str) -> int: ...
def depth(self) -> int: ...
def close(self) -> None: ...
Two implementations:
InMemoryDLQ. Process-local. Tests and research.SQLiteDLQ. File-backed. Triggers reject UPDATE of identity columns and DELETE of any row.
Quickstart
from horizon.ops import SQLiteDLQ
import horizon as hz
dlq = SQLiteDLQ("/var/lib/horizon/dlq.db")
hz.run(
mode="live",
feed=my_feed,
venues={"alpaca": venue},
audit_log=audit_log,
dlq=dlq, # captures every failed submit
...,
)
Every exception from a venue submit() inside the live loop now:
- Writes a
DeadLetteredOrderto the DLQ. - Emits an
AuditCategory.OrderRejectedevent withdlq_idin the payload. - Increments
horizon_order_rejects_total{layer="venue_exception"}if metrics is configured. - Continues the loop.
Entry shape
@dataclass(frozen=True)
class DeadLetteredOrder:
dlq_id: str # "dlq_<16 hex>"
captured_at: datetime # tz-aware
venue_name: str
account_id: str | None
market_id: str
side: str
quantity: float
order_type: str
price: float | None
client_order_id: str | None
error: str # truncated to 1000 chars
retry_count: int = 0
dismissed: bool = False
action_json: str = "" # full OrderAction for replay
Inspecting
for e in dlq.list():
print(f"{e.captured_at.isoformat()} {e.venue_name} "
f"{e.side} {e.quantity} {e.market_id} @ {e.price} "
f"(retries={e.retry_count})")
print(f" error: {e.error}")
dlq.list() hides dismissed entries. Pass include_dismissed=True to see them.
Replaying
Resubmit one entry. On success the entry is marked dismissed. On failure the retry count is bumped.
from horizon.ops import replay_order
ok, detail = replay_order(dlq, "dlq_abc123", venue, audit_log=audit_log)
if ok:
print(f"replayed -> venue order {detail}")
else:
print(f"still failing: {detail}")
Replay reconstructs the OrderAction from action_json and calls venue.submit(). The successful venue order id is emitted as an Annotation audit event referencing the original dlq_id.
Dismissing
For entries that should not be replayed (the position is no longer valid, the market closed, the strategy was disabled):
dlq.mark_dismissed("dlq_abc123")
The entry remains in the sink for the audit record. depth() no longer counts it.
CLI
The horizon CLI wires the common operations. Backed by the same DLQSink implementation used in the run loop:
$ horizon dlq list --db /var/lib/horizon/dlq.db
dlq_abc123 2026-04-18T14:32:15Z alpaca buy 100 AAPL @ 180.0 [pending, retries=0]
error: RuntimeError: HTTP 429 after 3 attempts
$ horizon dlq list --db /var/lib/horizon/dlq.db --all # include dismissed
...
$ horizon dlq replay dlq_abc123 --db /var/lib/horizon/dlq.db --venue alpaca --paper
replayed dlq_abc123 -> venue order ord_xyz
$ horizon dlq dismiss dlq_abc123 --db /var/lib/horizon/dlq.db
dismissed dlq_abc123
Supported --venue values: alpaca, kalshi, hyperliquid, polymarket, ibkr, and ccxt:<exchange_id> (for example ccxt:binance). The replay command constructs the venue with credentials from Secrets. Use --paper to route to the venue’s demo / sandbox / paper mode where supported.
WORM properties
SQLiteDLQ has two triggers:
- UPDATE of any identity column (
dlq_id,captured_at,venue_name,account_id,market_id,side,quantity,order_type,price,client_order_id,error,action_json) raisesIntegrityError. - DELETE from the table raises
IntegrityError.
Only retry_count and dismissed are mutable, and only via the Protocol’s methods. An auditor can run the SQLite file through regulatory review the same way the audit log is reviewed.
Storage sizing
Rough: each entry is about 1 KB in SQLite (plus the serialized OrderAction, which is 200 to 500 bytes). At 10 failed submits per day, a year is under 5 MB. Sizing is not a concern for advisor-scale deployments.
Retention policy is the firm’s call. Dismissed entries can stay for the audit period (five years under Rule 204-2; six under Rule 17a-4) and be archived like the audit log.
Metrics
Depth is exposed as a gauge for dashboards:
dlq_depth_samples = metrics.gauge(
MetricName.DlqDepth, dlq.depth(), venue="alpaca",
)
Wire a periodic gauge update or call it at EOD.
Not in scope
- Automatic replay. By design. Failed submits need human inspection; a retry loop at the DLQ layer would hide credential expiry, broker bans, or symbol typos that the operator should address.
- Priority queues. One table, chronological order. If prioritization matters, filter in the operator CLI.
- Cross-process DLQ.
SQLiteDLQis single-writer. For multi-writer deployments, a Postgres-backedDLQSinklands in L2 on the same Protocol.