RustPBX Operations Guide
This chapter targets on-call and operations engineers with a focus on “keeping the system healthy.” For feature fundamentals, see Overview, Basic Setup, and Routing/Trunks/Billing.
1. Daily checks
- Morning patrol: review the console homepage for node health, concurrent calls, and failure rates; escalate to Diagnostics if anything drifts.
- Log review: watch
callrecord,proxy, andconsolelogs; configure centralized log alerts for critical keywords. - Capacity headroom: capture peak CPS and concurrent calls from the previous day and compare against your limits to plan scale-out.
- Alert workflow: keep a graded SOP (P1/P2/P3) with a five-minute response SLA.
(Add an operations dashboard screenshot.)
2. Change & release process
- Prepare configs: branch your Git repo (covering
config/andconfig.toml), submit MR/PR, and run automated linting. - Canary policies:
- Routing: validate on low-priority DIDs first.
- Extensions/queues: exercise with test accounts.
- Billing templates: compare against old templates using sandbox prefixes.
- Execute Reload: follow the Diagnostics chapter for preflight checks, then trigger Reload via console or API.
- Rollback: if issues appear, revert to the previous Git tag, reload again, and validate via Diagnostics → Routing/Trunks until parity is restored.
3. Common runtime actions
- Emergency blocking: use
config/acl/or the console to block IPs/extensions; remember to reload ACL afterward. - Rate limiting: apply frequency limits or queue capacity caps to shed load.
- Recording retrieval: search Call Records by
call_id, export recordings/signaling for compliance or customer investigations. - Batch jobs: automate with in-house scripts that call APIs for password resets, billing template syncs, etc.
4. Backup & compliance
- Configuration: store
config/andconfig.tomlin Git with CI enforcing format/security checks. - Database: for PostgreSQL, run daily logical backups plus weekly full snapshots; for SQLite ensure the file sits on reliable storage.
- Recordings/CDR: retain 180/360 days per regulation, preferably mirroring to object storage with lifecycle policies.
- Audit trail: log every Reload/change so the console or Git commits identify the operator.
5. Automation & observability
- Health probe:
GET /health(implemented byhandler::ami) monitors database, SIP threads, and config state; feed it into load balancers or Prometheus blackbox exporters. - Logging/tracing: control
tracingvialog_levelandlog_file; centralizeaccess.logandcallrecordevents with alert conditions wired to the NOC. - Synthetic monitoring: schedule calls via
examples/perfcli.rsor third-party dial testing and compare with Diagnostics Evaluate output. - Alert routing: subscribe ChatOps/NOC channels to
/health, log keywords, and DB metrics so P1/P2 alerts auto-assign.
6. Operational guardrails
- Never modify production binaries or database schema by hand—ship every change through Git.
- Require dual reviews for trunk, routing, and billing changes before Reload.
- Announce high-risk changes outside business hours and reserve a rollback window.
(Insert an “operations workflow” or “on-call checklist” visual.)
Following these rules keeps RustPBX predictable in a 24×7 voice environment.