Troubleshooting Playbook
Start every investigation with the Diagnostics panel and relevant logs. This section catalogs common failure patterns, how to isolate them, and recommended tools.
1. Registration / signaling
| Symptom | Investigation | Resolution |
|---|---|---|
| Extension refuses to register | Diagnostics → SIP → Locator registry to confirm bindings and expiry | Verify password, SIP port, firewall rules; reset the password or clear stale bindings when required |
| Trunk status degraded | Run Diagnostics → Trunks probes or OPTIONS probe | Confirm peer IP and auth mode; enable backup trunks in config/trunks |
| INVITE has no response | Use sngrep or Diagnostics → Routing Evaluate to confirm rule hits | Double-check routing matches and ACL permissions |
2. Media & quality
- One-way / no audio:
- Inspect NAT/port mappings between server and peers.
- Ensure
rtp_start_port/rtp_end_portranges are open inconfig.tomland firewalls. - Reproduce via Diagnostics → Web Dialer or a handset, then capture RTP with
tcpdump/sngrepto verify return packets.
- Noise or jitter:
- Switch to lower bitrate codecs.
- Enable the denoise models from
fixtures/or turn on echo cancellation at the endpoint. - Check QoS policies and link bandwidth.
3. Routing & billing
- Routing ineffective: confirm Reload ran and validate
config/routessyntax viatomlcheckor CI. - Wrong route selected: Diagnostics → Routing Evaluate shows the hit rule/trunk; adjust
priorityormatchfilters accordingly. - Billing mismatch: export CDRs from Call Records, compare billing templates, and look for
no_ratealerts caused by missing prefixes.
4. Console / API
- Cannot log in: inspect the
[console]config and DB connection; make sure browser time is accurate to avoid expired tokens. - API returns 500: read
logs/console(or stdout) stack traces; most errors stem from missing config or unfinished DB migrations. - Diagnostics blank page: typically SIP server is down or the user lacks permission; validate
/healthreportsokand grantdiagnosticsaccess.
5. Performance & stability
- High CPU: use
top/btto locate hot threads, lower concurrency or scale out, and check for excessive transcoding. - Growing memory: see whether large recording buffers are enabled; verify the cleanup plan in
callrecord/storage.rs. - Crashes / restarts: consult
journalctlor container logs—configuration syntax errors or unreachable dependencies (DB/Redis) are common causes.
6. Incident workflow
- Gather evidence: screenshots from Diagnostics, log exports, precise timestamps.
- Roll back quickly: if caused by configuration, revert
config/in Git and reload. - Validate fix: place test calls and confirm CDRs/alerts return to normal.
- Document: record root cause, impact, and remediation steps in the internal wiki for future reference.
(Add an incident flowchart visual here.)