Skip to main content

Troubleshooting Playbook

Start every investigation with the Diagnostics panel and relevant logs. This section catalogs common failure patterns, how to isolate them, and recommended tools.

1. Registration / signaling

SymptomInvestigationResolution
Extension refuses to registerDiagnostics → SIP → Locator registry to confirm bindings and expiryVerify password, SIP port, firewall rules; reset the password or clear stale bindings when required
Trunk status degradedRun Diagnostics → Trunks probes or OPTIONS probeConfirm peer IP and auth mode; enable backup trunks in config/trunks
INVITE has no responseUse sngrep or Diagnostics → Routing Evaluate to confirm rule hitsDouble-check routing matches and ACL permissions

2. Media & quality

  1. One-way / no audio:
    • Inspect NAT/port mappings between server and peers.
    • Ensure rtp_start_port / rtp_end_port ranges are open in config.toml and firewalls.
    • Reproduce via Diagnostics → Web Dialer or a handset, then capture RTP with tcpdump/sngrep to verify return packets.
  2. Noise or jitter:
    • Switch to lower bitrate codecs.
    • Enable the denoise models from fixtures/ or turn on echo cancellation at the endpoint.
    • Check QoS policies and link bandwidth.

3. Routing & billing

  • Routing ineffective: confirm Reload ran and validate config/routes syntax via tomlcheck or CI.
  • Wrong route selected: Diagnostics → Routing Evaluate shows the hit rule/trunk; adjust priority or match filters accordingly.
  • Billing mismatch: export CDRs from Call Records, compare billing templates, and look for no_rate alerts caused by missing prefixes.

4. Console / API

  • Cannot log in: inspect the [console] config and DB connection; make sure browser time is accurate to avoid expired tokens.
  • API returns 500: read logs/console (or stdout) stack traces; most errors stem from missing config or unfinished DB migrations.
  • Diagnostics blank page: typically SIP server is down or the user lacks permission; validate /health reports ok and grant diagnostics access.

5. Performance & stability

  • High CPU: use top/bt to locate hot threads, lower concurrency or scale out, and check for excessive transcoding.
  • Growing memory: see whether large recording buffers are enabled; verify the cleanup plan in callrecord/storage.rs.
  • Crashes / restarts: consult journalctl or container logs—configuration syntax errors or unreachable dependencies (DB/Redis) are common causes.

6. Incident workflow

  1. Gather evidence: screenshots from Diagnostics, log exports, precise timestamps.
  2. Roll back quickly: if caused by configuration, revert config/ in Git and reload.
  3. Validate fix: place test calls and confirm CDRs/alerts return to normal.
  4. Document: record root cause, impact, and remediation steps in the internal wiki for future reference.

(Add an incident flowchart visual here.)