Skip to main content

Automatic Speech Recognition (ASR)

ASR functionality can convert audio to text in real-time and notify clients via events.

Configure ASR functionality in the asr field of the CallOption in the Invite/Accept/Refer commands, corresponding to calling, answering, and transfer scenarios respectively.

They have the same format, see TranscriptionOption for details.

AsrFinal
text: Hello
AsrFinal...
Client
Client
Hello(Audio)
Hello(Audio)
RustPBX
RustPBX
Phone
Phone
ASR Provider
ASR Provider
Hello(Text)
Hello(Text)
Hello(Adudio)
Hello(Adudio)
Session
Session
Text is not SVG - cannot display

Supported Providers

Currently supports four providers: Tencent Cloud, Alibaba Cloud, Deepgram, and VoiceApi. Can be selected via the provider field.

Different models and supported languages vary, refer to the corresponding documentation:

Parameters

provider

Select ASR provider.

  • tencent: Tencent Cloud
  • aliyun: Alibaba Cloud
  • voiceapi: VoiceApi

language

Language. Used for Alibaba Cloud to indicate which language to use for speech. Corresponds to the language_hints parameter of the API.

modelType

  • Tencent Cloud: Default uses 16k_zh_en, supports Chinese and English.
  • Alibaba Cloud: Default uses paraformer-realtime-v2, supports Chinese, English, Japanese, Korean, German, French, Russian

samplerate

Sample rate. Unit: Hz.

  • Tencent Cloud: Default 16000, needs to be the same as the model setting, e.g., 16k_zh_en corresponds to sample rate 16000.
  • Alibaba Cloud: Default 16000, only paraformer-realtime-v2 supports custom sample rate, other models have fixed values.
warning

Sample rate needs to be configured the same as Track sample rate:

  • SIP calls default to 16000hz
  • WebRTC calls depend on codec:
    • G722 16000hz
    • Opus 48000hz
    • Others 8000hz

endpoint

Custom service endpoint URL.

  • Alibaba Cloud: Default wss://dashscope.aliyuncs.com/api-ws/v1/inference.
  • Tencent Cloud: Default wss://asr.cloud.tencent.com/asr/v2/.

extra

Provider-specific parameters. Use the same field names as in the provider documentation in the extra field. These parameters will be passed directly to the provider.

startWhenAnswer

Whether to request ASR service after connection, defaults to false. When enabled, will ignore ringback tones.

Configure API Key

API Key can be configured in environment variables when starting RustPBX, or see TranscriptionOption for configuration.

Configure in environment variables:

  • Tencent Cloud:

    • TENCENT_APPID: Tencent Cloud appId
    • TENCENT_SECRET_ID: Tencent Cloud secretId
    • TENCENT_SECRET_KEY: Tencent Cloud secretKey
  • Alibaba Cloud:

    • DASHSCOPE_API_KEY: Alibaba Cloud Model Studio Bailian API Key
  • Deepgram:

    • DEEPGRAM_API_KEY: Deepgram API Key

Configure in TranscriptionOption:

Tencent Cloud:

  • appId: Tencent Cloud appId
  • secretId: Tencent Cloud secretId
  • secretKey: Tencent Cloud secretKey

Other providers:

  • secretKey: Provider API Key

ASR Events

RustPBX will push two types of ASR events:

  • AsrDelta: Intermediate recognition result (may change)
  • AsrFinal: Final recognition result (stable)

These two events have the same fields:

  • trackId: Call track ID (in transfer scenarios, there will be two Tracks)
  • index: Recognition result sequence number (only valid in Tencent Cloud ASR)
  • text: Recognized text result
  • timestamp: Event timestamp
  • startTime: Recognition start time
  • endTime: Recognition end time