Automatic Speech Recognition (ASR)
ASR functionality can convert audio to text in real-time and notify clients via events.
Configure ASR functionality in the asr field of the CallOption in the Invite/Accept/Refer commands, corresponding to calling, answering, and transfer scenarios respectively.
They have the same format, see TranscriptionOption for details.
Supported Providers
Currently supports four providers: Tencent Cloud, Alibaba Cloud, Deepgram, and VoiceApi. Can be selected via the provider field.
Different models and supported languages vary, refer to the corresponding documentation:
Parameters
provider
Select ASR provider.
tencent: Tencent Cloudaliyun: Alibaba Cloudvoiceapi: VoiceApi
language
Language. Used for Alibaba Cloud to indicate which language to use for speech. Corresponds to the language_hints parameter of the API.
modelType
- Tencent Cloud: Default uses 16k_zh_en, supports Chinese and English.
- Alibaba Cloud: Default uses paraformer-realtime-v2, supports Chinese, English, Japanese, Korean, German, French, Russian
samplerate
Sample rate. Unit: Hz.
- Tencent Cloud: Default 16000, needs to be the same as the model setting, e.g., 16k_zh_en corresponds to sample rate 16000.
- Alibaba Cloud: Default 16000, only
paraformer-realtime-v2supports custom sample rate, other models have fixed values.
Sample rate needs to be configured the same as Track sample rate:
- SIP calls default to 16000hz
- WebRTC calls depend on codec:
- G722 16000hz
- Opus 48000hz
- Others 8000hz
endpoint
Custom service endpoint URL.
- Alibaba Cloud: Default
wss://dashscope.aliyuncs.com/api-ws/v1/inference. - Tencent Cloud: Default
wss://asr.cloud.tencent.com/asr/v2/.
extra
Provider-specific parameters. Use the same field names as in the provider documentation in the extra field. These parameters will be passed directly to the provider.
startWhenAnswer
Whether to request ASR service after connection, defaults to false. When enabled, will ignore ringback tones.
Configure API Key
API Key can be configured in environment variables when starting RustPBX, or see TranscriptionOption for configuration.
Configure in environment variables:
-
Tencent Cloud:
TENCENT_APPID: Tencent Cloud appIdTENCENT_SECRET_ID: Tencent Cloud secretIdTENCENT_SECRET_KEY: Tencent Cloud secretKey
-
Alibaba Cloud:
DASHSCOPE_API_KEY: Alibaba Cloud Model Studio Bailian API Key
-
Deepgram:
DEEPGRAM_API_KEY: Deepgram API Key
Configure in TranscriptionOption:
Tencent Cloud:
appId: Tencent Cloud appIdsecretId: Tencent Cloud secretIdsecretKey: Tencent Cloud secretKey
Other providers:
secretKey: Provider API Key
ASR Events
RustPBX will push two types of ASR events:
- AsrDelta: Intermediate recognition result (may change)
- AsrFinal: Final recognition result (stable)
These two events have the same fields:
- trackId: Call track ID (in transfer scenarios, there will be two Tracks)
- index: Recognition result sequence number (only valid in Tencent Cloud ASR)
- text: Recognized text result
- timestamp: Event timestamp
- startTime: Recognition start time
- endTime: Recognition end time