Real-Time Latency
Transcription:Real-TimeDeployments:AllWhen transcribing in real-time, you can control the maximum time to wait for the final transcript using the max_delay
and max_delay_mode
transcription config options. You can also use enable_partials
to receive Partial transcripts in just a few hundred milliseconds.
{
"type": "transcription",
"transcription_config": {
"language": "en",
"max_delay": 2.0,
"max_delay_mode": "fixed",
"enable_partials": true
}
}
The max_delay
parameter controls the maximum latency of Finals in the real-time transcription engine. Finals latency is the delay in seconds between receiving input audio and returning Final transcription results. The default value of max_delay
is 10. The minimum and maximum values are 0.7 and 20. Note that max_delay
has no impact on how Partials are returned.
Max Delay Mode
Using a fixed value of max_delay
can increase the potential for inaccuracies in the transcript, especially around entities such as numerals, currencies, and dates.
Flexible max_delay_mode
allows greater flexibility in the maximum latency only when a potential entity has been detected. Entities are common concepts such as numbers, currencies and dates, and are discussed in more detail here.
There are two options for max_delay_mode
: fixed
and flexible
. The default is flexible
.
flexible
improves accuracy in entity recognition by allowing the latency to exceed themax_delay
threshold when a potential entity is detectedfixed
ensures that processing of final transcripts is constrained by themax_delay
threshold, even if this results in less accurate transcription of entities
Partial Transcripts
Partial transcripts are enabled using the enable_partials
config option. Partials allow users to receive transcription output before higher-accuracy Finals are returned. Typically Partials are returned in 500-800 milliseconds.
When Partial transcripts are enabled, Final transcripts are still returned. Partials are updated as more audio is received and further context is understood. This improves the accuracy until a Final transcript is generated for that section of audio. Once a Final is received, the partials are reset to empty.
Note that Partial transcripts have some limitations:
- Accuracy is usually 10-25% lower than the Final transcript. This includes lower accuracy of punctuation and capitalisation of words.
- Numeral Formatting is not returned in Partial transcripts
- Diarization is not returned in Partial transcripts
- The
confidence
field for Partial transcripts has no meaning and should not be relied on.