Transcribe in Real-Time
Transcription:Real-TimeDeployments:AllThe quickest way to try transcribing for free is by creating a Speechmatics account and using our Real-Time Demo in your browser.
This page will show you how to use the Speechmatics Real-Time SaaS WebSocket API to transcribe your voice in real-time by speaking into your microphone.
You can also learn about On-Prem deployments by following our guides.
Set Up
- Create an account on the Speechmatics On-Demand Portal here.
- Navigate to Manage > API Keys page in the Speechmatics On-Demand Portal.
- Enter a name for your API key and store your API key somewhere safe.
Enterprise customers should speak to Support to get your API keys.
Real-Time Transcription Examples
The examples below will help you get started by using the official Speechmatics CLI, Python and JavaScript libraries. You can of course integrate using the programming language of your choice by referring to the Real-Time API Reference.
- CLI
- Python - File
- Python - URL
- NodeJS - File
The Speechmatics Python library and CLI can found on GitHub and installed using pip:
pip3 install speechmatics-python
Transcribe a file in real-time using the Speechmatics Python library. Just copy in your API key and file name to get started!
speechmatics config set --auth-token $API_KEY
speechmatics rt transcribe example.wav
The Speechmatics Python library and CLI can found on GitHub and installed using pip:
pip3 install speechmatics-python
Transcribe a file in real-time using the Speechmatics Python library. Just copy in your API key and file name to get started!
1import speechmatics
2from httpx import HTTPStatusError
3
4API_KEY = "YOUR_API_KEY"
5PATH_TO_FILE = "example.wav"
6LANGUAGE = "en"
7CONNECTION_URL = f"wss://eu2.rt.speechmatics.com/v2"
8
9# Create a transcription client
10ws = speechmatics.client.WebsocketClient(
11 speechmatics.models.ConnectionSettings(
12 url=CONNECTION_URL,
13 auth_token=API_KEY,
14 )
15)
16
17# Define an event handler to print the partial transcript
18def print_partial_transcript(msg):
19 print(f"[partial] {msg['metadata']['transcript']}")
20
21# Define an event handler to print the full transcript
22def print_transcript(msg):
23 print(f"[ FULL] {msg['metadata']['transcript']}")
24
25# Register the event handler for partial transcript
26ws.add_event_handler(
27 event_name=speechmatics.models.ServerMessageType.AddPartialTranscript,
28 event_handler=print_partial_transcript,
29)
30
31# Register the event handler for full transcript
32ws.add_event_handler(
33 event_name=speechmatics.models.ServerMessageType.AddTranscript,
34 event_handler=print_transcript,
35)
36
37settings = speechmatics.models.AudioSettings()
38
39# Define transcription parameters
40# Full list of parameters described here: https://speechmatics.github.io/speechmatics-python/models
41conf = speechmatics.models.TranscriptionConfig(
42 language=LANGUAGE,
43 enable_partials=True,
44 max_delay=5,
45)
46
47print("Starting transcription (type Ctrl-C to stop):")
48with open(PATH_TO_FILE, 'rb') as fd:
49 try:
50 ws.run_synchronously(fd, conf, settings)
51 except KeyboardInterrupt:
52 print("\nTranscription stopped.")
53 except HTTPStatusError as e:
54 if e.response.status_code == 401:
55 print('Invalid API key - Check your API_KEY at the top of the code!')
56 else:
57 raise e
58
The Speechmatics Python library and CLI can found on GitHub and installed using pip:
pip3 install speechmatics-python
Transcribe an audio stream in real-time using the Speechmatics Python library. Just copy in your API key to get started!
1import speechmatics
2from httpx import HTTPStatusError
3from urllib.request import urlopen
4
5API_KEY = "YOUR_API_KEY"
6LANGUAGE = "en"
7CONNECTION_URL = f"wss://eu2.rt.speechmatics.com/v2"
8
9# The raw audio stream will be a few seconds ahead of the radio
10AUDIO_STREAM_URL="https://media-ice.musicradio.com/LBCUKMP3" # LBC Radio stream
11
12audio_stream = urlopen(AUDIO_STREAM_URL)
13
14# Create a transcription client
15ws = speechmatics.client.WebsocketClient(
16 speechmatics.models.ConnectionSettings(
17 url=CONNECTION_URL,
18 auth_token=API_KEY,
19 )
20)
21
22# Define an event handler to print the partial transcript
23def print_partial_transcript(msg):
24 print(f"[partial] {msg['metadata']['transcript']}")
25
26# Define an event handler to print the full transcript
27def print_transcript(msg):
28 print(f"[ FULL] {msg['metadata']['transcript']}")
29
30# Register the event handler for partial transcript
31ws.add_event_handler(
32 event_name=speechmatics.models.ServerMessageType.AddPartialTranscript,
33 event_handler=print_partial_transcript,
34)
35
36# Register the event handler for full transcript
37ws.add_event_handler(
38 event_name=speechmatics.models.ServerMessageType.AddTranscript,
39 event_handler=print_transcript,
40)
41
42settings = speechmatics.models.AudioSettings()
43
44# Define transcription parameters
45# Full list of parameters described here: https://speechmatics.github.io/speechmatics-python/models
46conf = speechmatics.models.TranscriptionConfig(
47 language=LANGUAGE,
48 enable_partials=True,
49 max_delay=5,
50)
51
52print("Starting transcription (type Ctrl-C to stop):")
53try:
54 ws.run_synchronously(audio_stream, conf, settings)
55except KeyboardInterrupt:
56 print("\nTranscription stopped.")
57except HTTPStatusError as e:
58 if e.response.status_code == 401:
59 print('Invalid API key - Check your API_KEY at the top of the code!')
60 else:
61 raise e
62
The Speechmatics JavaScript library can be found on GitHub and installed using NPM:
npm install speechmatics
Transcribe a file in real-time using the Speechmatics JavaScript library. Just copy in your API key and file name to get started!
1const fs = require('fs');
2const { RealtimeSession } = require('speechmatics');
3
4const API_KEY = 'YOUR_API_KEY';
5const PATH_TO_FILE = 'example.wav';
6
7const session = new RealtimeSession({ apiKey: API_KEY });
8
9session.addListener('Error', (error) => {
10 console.log('session error', error);
11});
12
13session.addListener('AddTranscript', (message) => {
14 process.stdout.write(message.metadata.transcript);
15});
16
17session.addListener('EndOfTranscript', () => {
18 process.stdout.write('\n')
19});
20
21session
22 .start({
23 transcription_config: {
24 language: 'en',
25 operating_point: 'enhanced',
26 enable_partials: true,
27 max_delay: 2,
28 },
29 audio_format: { type: 'file' },
30 })
31 .then(() => {
32 //prepare file stream
33 const fileStream = fs.createReadStream(PATH_TO_FILE);
34
35 //send it
36 fileStream.on('data', (sample) => {
37 session.sendAudio(sample);
38 });
39
40 //end the session
41 fileStream.on('end', () => {
42 session.stop();
43 });
44
45 })
46 .catch((error) => {
47 console.log('error', error.message);
48 });
49
Transcript Outputs
The output format from the Speech API is JSON. There are two types of transcript that are provided: Final transcripts and Partial transcripts. Which one you decide to consume will depend on your use case, latency and accuracy requirements.
Final Transcripts
Final transcripts are sentences or phrases that are provided at irregular intervals. Once output, these transcripts are considered final and will not be updated afterwards. The timing of the output is determined automatically by the Speechmatics ASR engine. This is affected by pauses in speech and other parameters resulting in a latency between audio input and output. The default latency can be adjusted using the max_delay
property in transcription_config
when starting the recognition session. Final transcripts are more accurate than partial transcripts, and larger values of max_delay
increase the accuracy.
Partial Transcripts
A Partial transcript, or Partial, is a transcript that can be updated at a later point in time. By default, only Final transcripts are produced. Partials must be explicitly enabled using the enable_partials
property in transcription_config
for the session. After a Partial transcript is first output, the Speechmatics ASR engine can use additional audio data and context to update the Partial. Hence, Partials are therefore available at very low latency but with lower initial accuracy. Partials typically provide a latency (the time between audio input and initial output) of less than 1 second. Partials can be used in conjunction with Final transcripts to provide low-latency transcripts which are adjusted over time.