Troubleshooting Playbook

Start every investigation with the Diagnostics panel and relevant logs. This section catalogs common failure patterns, how to isolate them, and recommended tools.

1. Registration / signaling

Symptom	Investigation	Resolution
Extension refuses to register	Diagnostics → SIP → Locator registry to confirm bindings and expiry	Verify password, SIP port, firewall rules; reset the password or clear stale bindings when required
Trunk status degraded	Run Diagnostics → Trunks probes or OPTIONS probe	Confirm peer IP and auth mode; enable backup trunks in `config/trunks`
INVITE has no response	Use `sngrep` or Diagnostics → Routing Evaluate to confirm rule hits	Double-check routing matches and ACL permissions

2. Media & quality

One-way / no audio:
- Inspect NAT/port mappings between server and peers.
- Ensure rtp_start_port / rtp_end_port ranges are open in config.toml and firewalls.
- Reproduce via Diagnostics → Web Dialer or a handset, then capture RTP with tcpdump/sngrep to verify return packets.
Noise or jitter:
- Switch to lower bitrate codecs.
- Enable the denoise models from fixtures/ or turn on echo cancellation at the endpoint.
- Check QoS policies and link bandwidth.

3. Routing & billing

Routing ineffective: confirm Reload ran and validate config/routes syntax via tomlcheck or CI.
Wrong route selected: Diagnostics → Routing Evaluate shows the hit rule/trunk; adjust priority or match filters accordingly.
Billing mismatch: export CDRs from Call Records, compare billing templates, and look for no_rate alerts caused by missing prefixes.

4. Console / API

Cannot log in: inspect the [console] config and DB connection; make sure browser time is accurate to avoid expired tokens.
API returns 500: read logs/console (or stdout) stack traces; most errors stem from missing config or unfinished DB migrations.
Diagnostics blank page: typically SIP server is down or the user lacks permission; validate /health reports ok and grant diagnostics access.

5. Performance & stability

High CPU: use top/bt to locate hot threads, lower concurrency or scale out, and check for excessive transcoding.
Growing memory: see whether large recording buffers are enabled; verify the cleanup plan in callrecord/storage.rs.
Crashes / restarts: consult journalctl or container logs—configuration syntax errors or unreachable dependencies (DB/Redis) are common causes.

6. Incident workflow

Gather evidence: screenshots from Diagnostics, log exports, precise timestamps.
Roll back quickly: if caused by configuration, revert config/ in Git and reload.
Validate fix: place test calls and confirm CDRs/alerts return to normal.
Document: record root cause, impact, and remediation steps in the internal wiki for future reference.

(Add an incident flowchart visual here.)

1. Registration / signaling​

2. Media & quality​

3. Routing & billing​

4. Console / API​

5. Performance & stability​

6. Incident workflow​