Text-to-Speech (TTS)
The TTS command converts text to speech and plays it.
TTS Track
The TTS command creates a TTS Track for forwarding audio. The start and end of the TTS Track will trigger Track Start and Track End events respectively.
When playId is set in the TTS command, the corresponding TrackEnd event will contain this playId, which can be used to get playback completion notifications.
TTS Command Parameters
text
Text to convert.
Some providers limit text length and content, special characters may cause errors. Please refer to the provider documentation at the end of this document.
speaker
Set the voice tone. Note that different models support different tones, and tones support different languages, prices also vary.
- Alibaba Cloud: Alibaba Cloud Voice List, default model is
cosyvoice-v2, default voice islongyumi_v2. - Tencent Cloud: Tencent Cloud Voice List, default voice is
101001. - Deepgram: Deepgram Voice List, default voice is
aura-2-thalia-en.
playId
playId is used to complete one playback with multiple commands. When consecutive TTS commands use the same playId, they will reuse the same TTS Track.
If a TTS command uses a different playId than the previous TTS command, and the previous TTS Track has not ended, it will terminate the previous TTS Track and create a new TTS Track.
endOfStream
Set endOfStream=true to indicate that all TTS commands for the current playId have been sent. The TTS Track will exit after all command results finish playing and send a Track End event.
streaming
streaming is used to choose between the provider's streaming and non-streaming APIs:
-
Set
streaming=trueto use the provider's streaming API, multiple TTS commands will be sent consecutively in the same WebSocket connection. -
Set
streaming=false(default), each TTS command will use a separate http request.
Streaming text fragments are shorter, so caching mechanism will be disabled.
Non-streaming uses separate requests, can set max_concurrent_tasks option to set request concurrency, default is 1.
base64
Set base64=true to use the text field to pass base64 encoded audio.
Audio needs to be PCM format with sample rate of 16kHz.
autoHangup
Set autoHangup=true to automatically hang up after playback completes.
waitInputTimeout
Set waitInputTimeout, when exceeding waitInputTimeout milliseconds with no audio input, will trigger Silence event.
Caching
When streaming=false is set, caching mechanism will be enabled.
TTS command request audio results will be saved to /tmp/mediacache (default path), file name is: {text_hash}-{sample_rate}-{speaker}-{speed}.pcm.
When subsequent commands use the same text, sample rate, speaker, and speed, the cached file will be used directly.
Cache path can be modified in the RustPBX configuration file, example:
[tts]
media_cache_path = "/tmp/mediacache"
Other Configuration
More TTS options can be configured in both the tts field of CallOption in the Invite/Accept commands and the option field in TTS commands.
Their parameters are the same, see SynthesisOption for details.
If configured in both places, options in TTS command will override options in CallOption.
Mainly includes:
samplerate: Sample rate- Default: 16000, best set to the same as Track sample rate, resampling will have additional performance cost.
speed: Speech rate- Default is provider's standard speech rate. Deepgram does not support adjusting speech rate. Specific configuration needs to refer to provider documentation.
volume: Volume- Default is provider's standard volume. Deepgram does not support adjusting volume, specific configuration needs to refer to provider documentation.
emotion: Emotion- Note that models and voices support emotions differently, specific configuration needs to refer to provider documentation.
endpoint: Custom service endpoint URL- tencent default: non-streaming
wss://tts.cloud.tencent.com/stream_ws, streamingwss://tts.cloud.tencent.com/stream_wsv2. - aliyun default:
wss://dashscope.aliyuncs.com/api-ws/v1/inference. - deepgram default:
(https/wss)://api.deepgram.com/v1/speak.
- tencent default: non-streaming
extra: Provider-specific parameters, passed directly to provider.max_concurrent_tasks: Maximum concurrent tasks for non-streaming TTS commands, default is 1.
If the provider has other parameters, they can also use the same field names in the extra field. These parameters will be passed directly to the provider.
API Key Configuration
API Key can be configured in environment variables when starting RustPBX, or configured in SynthesisOption.
Configure in environment variables:
-
Tencent Cloud:
TENCENT_APPID: Tencent Cloud appIdTENCENT_SECRET_ID: Tencent Cloud secretIdTENCENT_SECRET_KEY: Tencent Cloud secretKey
-
Alibaba Cloud:
DASHSCOPE_API_KEY: Alibaba Cloud Model Studio Bailian API Key
-
Deepgram:
DEEPGRAM_API_KEY: Deepgram API Key
Configure in SynthesisOption:
Tencent Cloud:
appId: Tencent Cloud appIdsecretId: Tencent Cloud secretIdsecretKey: Tencent Cloud secretKey
Other providers:
secretKey: Provider API Key
Interruption
Use the Interrupt command to interrupt TTS that is currently playing.
- If
graceful=trueis set, the TTS Track will exit after the current TTS command finishes playing (only effective in non-streaming TTS). - If
graceful=false(default) is set, playback will be interrupted immediately.
If there is TTS currently playing and the provider supports subtitles, an Interruption event will be triggered, containing the played time and text position (if supported by the provider).
For more information, see: