RustPBX Operations Guide

This chapter targets on-call and operations engineers with a focus on “keeping the system healthy.” For feature fundamentals, see Overview, Basic Setup, and Routing/Trunks/Billing.

1. Daily checks

Morning patrol: review the console homepage for node health, concurrent calls, and failure rates; escalate to Diagnostics if anything drifts.
Log review: watch callrecord, proxy, and console logs; configure centralized log alerts for critical keywords.
Capacity headroom: capture peak CPS and concurrent calls from the previous day and compare against your limits to plan scale-out.
Alert workflow: keep a graded SOP (P1/P2/P3) with a five-minute response SLA.

(Add an operations dashboard screenshot.)

2. Change & release process

Prepare configs: branch your Git repo (covering config/ and config.toml), submit MR/PR, and run automated linting.
Canary policies:
- Routing: validate on low-priority DIDs first.
- Extensions/queues: exercise with test accounts.
- Billing templates: compare against old templates using sandbox prefixes.
Execute Reload: follow the Diagnostics chapter for preflight checks, then trigger Reload via console or API.
Rollback: if issues appear, revert to the previous Git tag, reload again, and validate via Diagnostics → Routing/Trunks until parity is restored.

3. Common runtime actions

Emergency blocking: use config/acl/ or the console to block IPs/extensions; remember to reload ACL afterward.
Rate limiting: apply frequency limits or queue capacity caps to shed load.
Recording retrieval: search Call Records by call_id, export recordings/signaling for compliance or customer investigations.
Batch jobs: automate with in-house scripts that call APIs for password resets, billing template syncs, etc.

4. Backup & compliance

Configuration: store config/ and config.toml in Git with CI enforcing format/security checks.
Database: for PostgreSQL, run daily logical backups plus weekly full snapshots; for SQLite ensure the file sits on reliable storage.
Recordings/CDR: retain 180/360 days per regulation, preferably mirroring to object storage with lifecycle policies.
Audit trail: log every Reload/change so the console or Git commits identify the operator.

5. Automation & observability

Health probe: GET /health (implemented by handler::ami) monitors database, SIP threads, and config state; feed it into load balancers or Prometheus blackbox exporters.
Logging/tracing: control tracing via log_level and log_file; centralize access.log and callrecord events with alert conditions wired to the NOC.
Synthetic monitoring: schedule calls via examples/perfcli.rs or third-party dial testing and compare with Diagnostics Evaluate output.
Alert routing: subscribe ChatOps/NOC channels to /health, log keywords, and DB metrics so P1/P2 alerts auto-assign.

6. Operational guardrails

Never modify production binaries or database schema by hand—ship every change through Git.
Require dual reviews for trunk, routing, and billing changes before Reload.
Announce high-risk changes outside business hours and reserve a rollback window.

(Insert an “operations workflow” or “on-call checklist” visual.)

Following these rules keeps RustPBX predictable in a 24×7 voice environment.

1. Daily checks​

2. Change & release process​

3. Common runtime actions​

4. Backup & compliance​

5. Automation & observability​

6. Operational guardrails​