Deployment

What you need to run flow surveillance continuously against live venues: hardware, dependencies, config, process supervision, monitoring, backups, and upgrades.

This page covers deploying horizon.flow as a long-running service against live feeds. If you are trying the module for the first time, the quickstart runs everything against a synthetic fixture with no setup; this page assumes you have decided to put the module in front of real markets.

1. What you need

Hardware

Baseline for v0.1 target rates (low thousands of events per second across Polymarket, Kalshi, Hyperliquid combined):

  • 2 vCPU.
  • 4 GB RAM.
  • 50 GB SSD for the flow store, audit log, and recorded feeds.
  • NTP-synced clock. Windowed detectors (spoofing, layering, quote-stuffing) depend on accurate timestamps; clock drift creates false negatives and complicates audit reconstruction.

Scale up RAM proportionally if you are tracking many markets at once. Each market holds rolling deques bounded by FlowConfig.max_trades_per_market (default 2000) and max_book_snapshots_per_market (default 100).

Software

  • Python 3.10 or newer.
  • Linux is recommended for production (Ubuntu 22.04+, Debian 12+, RHEL 9). macOS works for development.
  • A paid on-chain RPC endpoint if you use Polymarket or Hyperliquid. Public Polygon RPC is rate-limited and will throttle during busy markets, which means you will miss events. Alchemy, QuickNode, Infura, or a self-hosted node all work.

Install

Pin the version you reviewed:

bash
pip install 'horizon[flow,flow-irl,flow-ml]==0.1.0'

The three extras cover:

  • [flow]: base module, clustering, on-chain ingestion. Required.
  • [flow-irl]: inverse-RL policy recovery. Optional; default shadow-policy path does not need it.
  • [flow-ml]: machine-learning anomaly detectors. Optional.

2. Directory layout

A typical deployment organizes files like this. Adapt paths to your firm’s conventions.

<deploy-root>/
├── venv/                        # Python virtualenv
├── config/
│   ├── flow.toml                # FlowConfig as TOML
│   └── secrets.env              # chmod 600, API keys, RPC URLs
├── data/
│   ├── flow.db                  # SQLiteFlowStore
│   ├── audit.db                 # AuditLog (hash chain)
│   └── feeds/                   # recorded feed logs
│       └── 2026-04-20.jsonl
├── logs/
│   ├── flow.log
│   └── flow.err
└── backups/                     # managed by the backup job

Lock down permissions: the data/, config/, and logs/ directories should be readable only by the service account. The data/ directory is the compliance artifact; treat it like any WORM archive root.

3. Configuration

Config file

Every threshold, window, and tuning parameter goes in a versioned TOML file so diffs are reviewable:

toml
# config/flow.toml
seed = 42
max_trades_per_market = 2000
max_book_snapshots_per_market = 100

[windows]
short_s = 5.0
medium_s = 60.0
long_s = 3600.0

[detectors.spoofing]
min_bait_size = 500.0
bait_to_aggressor_ratio = 5.0
cancel_window_ms = 2000.0
min_book_imbalance = 0.3

[detectors.layering]
min_layers = 3
cancel_within_ms = 3000.0

[alerts]
severity_floor = "High"

Check this file into a compliance-reviewed config repo. Every threshold change is a review event; see institutional readiness for why.

Secrets

Secrets belong in a separate file loaded by your process supervisor as environment variables:

bash
# config/secrets.env (chmod 600)
POLYGON_RPC_URL=https://polygon-mainnet.g.alchemy.com/v2/YOUR_KEY
HYPERLIQUID_API_URL=https://api.hyperliquid.xyz
KALSHI_API_URL=https://trading-api.kalshi.com
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR_PATH
PAGERDUTY_INTEGRATION_KEY=YOUR_KEY

Never commit secrets to git, never put them in the config TOML, never log them. See professionals/secrets for vault rotation.

Loading the config

The standard entrypoint builds a FlowConfig from the TOML and wires the engine:

python
# run_flow.py
import tomllib
from pathlib import Path

from horizon.audit import AuditLog, SQLiteSink
from horizon.flow import SQLiteFlowStore, make_default_engine, set_default_store
from horizon.flow.config import FlowConfig
from horizon.flow.ingestion import PolymarketFlowSource

cfg_dict = tomllib.loads(Path("config/flow.toml").read_text())
cfg = FlowConfig.from_dict(cfg_dict)

flow_store = SQLiteFlowStore("data/flow.db")
set_default_store(flow_store)

audit_log = AuditLog(sink=SQLiteSink("data/audit.db"))
engine = make_default_engine(
    venue_name="polymarket",
    store_path="data/flow.db",
    audit_log=audit_log,
    config=cfg,
)

source = PolymarketFlowSource(market_ids=["0xTRUMP_2024"], engine=engine)
source.connect()       # blocks forever

4. Process supervision

systemd

The simplest long-running setup. Adapt paths to your deploy root:

ini
# /etc/systemd/system/horizon-flow.service
[Unit]
Description=Horizon flow surveillance
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
User=<service-user>
Group=<service-group>
WorkingDirectory=<deploy-root>
EnvironmentFile=<deploy-root>/config/secrets.env
ExecStart=<deploy-root>/venv/bin/python <deploy-root>/run_flow.py
Restart=on-failure
RestartSec=10s

# Hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=<deploy-root>/data <deploy-root>/logs

# Resource limits
LimitNOFILE=65536
MemoryMax=2G

# Output
StandardOutput=append:<deploy-root>/logs/flow.log
StandardError=append:<deploy-root>/logs/flow.err

[Install]
WantedBy=multi-user.target

Enable and start:

bash
sudo systemctl daemon-reload
sudo systemctl enable --now horizon-flow
sudo journalctl -u horizon-flow -f   # tail output

Other supervisors

The service is a plain Python process reading environment variables and a TOML file. Supervisord, runit, Docker, Kubernetes, Nomad all work. Just make sure:

  • The process is restarted on failure.
  • Restart=on-failure or equivalent, with a short backoff (5 to 30 seconds).
  • The data directory is writable.
  • Environment variables for secrets are set.

5. Logging

The module uses Python’s standard logging under the horizon.flow logger hierarchy. Production defaults:

  • INFO for the engine.
  • WARNING for detectors.
  • DEBUG only during incident debugging.

Rotate with logrotate, keep 30 days on disk, then archive to the same WORM destination as the flow store:

# /etc/logrotate.d/horizon-flow
<deploy-root>/logs/*.log {
    daily
    rotate 30
    compress
    delaycompress
    notifempty
    create 600 <service-user> <service-group>
    postrotate
        systemctl kill -s HUP horizon-flow.service
    endscript
}

The module never logs API keys. Third-party libraries (web3, websockets) sometimes log request URLs; review your log-level config before shipping.

6. Monitoring

Three signals to track:

Ingest health

The engine exports flow_events_ingested_total through the existing metrics bridge. Alert when the event rate drops to zero for more than 60 seconds during market hours. Usually means the WebSocket died or the RPC endpoint is throttling.

Detector output

flow_findings_total{category,severity} is a counter. Two alerts worth setting:

  • A spike of Critical findings (10 or more in 5 minutes). Either a real incident in the market or a detector mis-threshold; either way, page on it.
  • Zero findings for 24 hours on a venue you expect to see activity on. Probably a silent failure.

Store and audit health

  • flow_store_write_latency_seconds histogram. Should stay under 1 ms.
  • flow_audit_chain_verify_ok gauge, 1 when periodic AuditChain.verify() succeeds.

If the chain verify fails, stop trading and investigate. This is a compliance-grade incident.

7. Backup and restore

The flow store and the audit log are the compliance artifacts. They must survive disk failure and auditor requests years later.

Daily snapshot

Use SQLite’s backup API (not cp; WAL mode means you would miss committed-but-uncheckpointed data):

bash
#!/bin/bash
# backup.sh, invoked daily by a timer
set -euo pipefail

DATE=$(date +%Y-%m-%d)
DEST=<deploy-root>/backups/${DATE}
mkdir -p "${DEST}"

sqlite3 <deploy-root>/data/flow.db ".backup '${DEST}/flow.db'"
sqlite3 <deploy-root>/data/audit.db ".backup '${DEST}/audit.db'"

# Integrity check
sqlite3 "${DEST}/flow.db" "pragma integrity_check;" | grep -q ok
sqlite3 "${DEST}/audit.db" "pragma integrity_check;" | grep -q ok

# Ship to immutable storage (S3 with Object Lock, or your firm's WORM archive)
tar -czf "${DEST}.tar.gz" -C <deploy-root>/backups "${DATE}"
aws s3 cp "${DEST}.tar.gz" s3://<your-worm-bucket>/flow/ \
    --storage-class STANDARD_IA \
    --object-lock-mode COMPLIANCE \
    --object-lock-retain-until-date "$(date -u -d '+7 years' +%Y-%m-%dT%H:%M:%SZ)"

# Local cleanup
find <deploy-root>/backups -type d -mtime +14 -exec rm -rf {} +

Schedule with a systemd timer or a cron job. Daily at 02:00 UTC is a common choice.

Restore drill

Practice once per quarter:

  1. Pull yesterday’s backup from S3.
  2. Untar into a scratch directory on a cold host.
  3. Run horizon flow verify --db=<restored-flow-db>. This confirms WORM triggers still reject writes and row counts match the backup log.
  4. Run AuditChain.verify() on the restored audit log.
  5. Spot-check a finding: horizon flow profile --actor=<known-wallet> --db=<restored-flow-db>.

If any step fails, the backup procedure is broken. Document the drill outcome.

8. Upgrades

bash
# 1. Snapshot first.
./backup.sh

# 2. Stop the service.
sudo systemctl stop horizon-flow

# 3. Upgrade in the venv.
./venv/bin/pip install -U 'horizon[flow,flow-irl,flow-ml]==0.1.1'

# 4. Run the validation pass in a sandbox venv against a copy of flow.db to
#    confirm schema compatibility and detector thresholds. See
#    institutional-readiness for the validation command list.

# 5. Restart.
sudo systemctl start horizon-flow

# 6. Watch logs for a few minutes.
sudo journalctl -u horizon-flow -f
horizon flow verify --db=<deploy-root>/data/flow.db

Never skip the snapshot. Never upgrade during market hours unless rolling back to a known-good version is the explicit plan.

9. Incident response

If the hash chain verify fails, a detector starts flooding, or the engine crashes mid-event:

  1. Do not stop the engine immediately. Leave the audit log and flow store as they are; they are evidence.
  2. Pause any strategies running FlowAnomalyCheck. They may be blocking or allowing trades on stale state.
  3. Capture state: systemctl status horizon-flow, recent journalctl output, and the last hour of flow.log.
  4. Run verify: horizon flow verify --db=<deploy-root>/data/flow.db, plus AuditChain.verify() on the audit log. Record the result.
  5. Restore if needed. If the flow store is corrupt, replay from the last good backup plus the feed log (see replay below).
  6. File a post-mortem. Compliance expects an incident record for anything that broke the audit trail.

10. Replay and regression

Every event written to a feed log can be replayed byte-deterministically:

bash
horizon flow replay \
    --feed-log=<deploy-root>/data/feeds/2026-04-20.jsonl \
    --config=<deploy-root>/config/flow.toml \
    --store=/tmp/replay.db

Three uses:

  • Regression testing after a detector threshold change. Replay a week and diff findings.
  • Forensics. If a detector fired on a specific event, replay the surrounding window under a higher log level.
  • Audit response. Reproduce any finding on demand from the recorded stream plus the config in effect at the time.

See CLI for the full command surface.

11. Network egress

The module makes outbound calls to:

  • Polygon RPC (your provider), HTTPS port 443. One persistent WebSocket plus HTTPS for eth_getTransactionByHash enrichment.
  • Hyperliquid API, HTTPS and WSS port 443.
  • Kalshi API, HTTPS and WSS port 443.
  • Slack webhook and PagerDuty integration, HTTPS port 443.

Lock down egress with a firewall allow-list. The module does not listen on any port.

12. Pre-flight checklist

Before routing real trades through anything downstream of the module:

  • Python 3.10 or newer, venv built, horizon[flow,flow-irl,flow-ml] installed at a pinned version.
  • config/flow.toml committed to the compliance-reviewed config repo. Diff is signed off.
  • config/secrets.env is chmod 600, owned by the service user, not in git.
  • Polygon RPC is a paid endpoint if Polymarket or Hyperliquid are in scope.
  • Clock is NTP-synced. chronyc tracking shows offset under 10 ms.
  • Data disk has at least 50 GB free and is monitored.
  • Process supervisor is running; systemctl status reports active.
  • Backup timer enabled. Restore drill ran successfully in the last quarter.
  • Log rotation configured.
  • Metrics scrape is working; flow_events_ingested_total is counting up.
  • Alerter routing validated end-to-end with a test AnomalySeverity.Critical finding.
  • FlowAnomalyCheck is wired into the strategies that should gate on findings.
  • Institutional-readiness checklist signed off.
  • Compliance memo filed.
  • On-call rotation updated with the incident runbook above.