Skip to main content

RustPBX Operations Guide

This chapter targets on-call and operations engineers with a focus on “keeping the system healthy.” For feature fundamentals, see Overview, Basic Setup, and Routing/Trunks/Billing.

1. Daily checks

  • Morning patrol: review the console homepage for node health, concurrent calls, and failure rates; escalate to Diagnostics if anything drifts.
  • Log review: watch callrecord, proxy, and console logs; configure centralized log alerts for critical keywords.
  • Capacity headroom: capture peak CPS and concurrent calls from the previous day and compare against your limits to plan scale-out.
  • Alert workflow: keep a graded SOP (P1/P2/P3) with a five-minute response SLA.

(Add an operations dashboard screenshot.)

2. Change & release process

  1. Prepare configs: branch your Git repo (covering config/ and config.toml), submit MR/PR, and run automated linting.
  2. Canary policies:
    • Routing: validate on low-priority DIDs first.
    • Extensions/queues: exercise with test accounts.
    • Billing templates: compare against old templates using sandbox prefixes.
  3. Execute Reload: follow the Diagnostics chapter for preflight checks, then trigger Reload via console or API.
  4. Rollback: if issues appear, revert to the previous Git tag, reload again, and validate via Diagnostics → Routing/Trunks until parity is restored.

3. Common runtime actions

  • Emergency blocking: use config/acl/ or the console to block IPs/extensions; remember to reload ACL afterward.
  • Rate limiting: apply frequency limits or queue capacity caps to shed load.
  • Recording retrieval: search Call Records by call_id, export recordings/signaling for compliance or customer investigations.
  • Batch jobs: automate with in-house scripts that call APIs for password resets, billing template syncs, etc.

4. Backup & compliance

  • Configuration: store config/ and config.toml in Git with CI enforcing format/security checks.
  • Database: for PostgreSQL, run daily logical backups plus weekly full snapshots; for SQLite ensure the file sits on reliable storage.
  • Recordings/CDR: retain 180/360 days per regulation, preferably mirroring to object storage with lifecycle policies.
  • Audit trail: log every Reload/change so the console or Git commits identify the operator.

5. Automation & observability

  • Health probe: GET /health (implemented by handler::ami) monitors database, SIP threads, and config state; feed it into load balancers or Prometheus blackbox exporters.
  • Logging/tracing: control tracing via log_level and log_file; centralize access.log and callrecord events with alert conditions wired to the NOC.
  • Synthetic monitoring: schedule calls via examples/perfcli.rs or third-party dial testing and compare with Diagnostics Evaluate output.
  • Alert routing: subscribe ChatOps/NOC channels to /health, log keywords, and DB metrics so P1/P2 alerts auto-assign.

6. Operational guardrails

  • Never modify production binaries or database schema by hand—ship every change through Git.
  • Require dual reviews for trunk, routing, and billing changes before Reload.
  • Announce high-risk changes outside business hours and reserve a rollback window.

(Insert an “operations workflow” or “on-call checklist” visual.)

Following these rules keeps RustPBX predictable in a 24×7 voice environment.