Streaming transcription: turning speech into text in real time

Streaming transcription converts speech to text continuously as it is spoken, emitting words within moments of being said — rather than waiting until the recording is finished to produce a transcript.

Streaming vs batch transcription

Batch transcription processes a complete recording after the fact; streaming transcription produces text live, second by second. Only the streaming kind can feed a summary that updates during a meeting.

How it fits together

Streaming transcription usually sits between voice activity detection, which finds the speech, and speaker diarization, which labels who said it. The live text it produces is then condensed into a rolling summary.

Why it matters

Without streaming transcription there is no real-time summary — the words have to exist as text before they can be condensed. It’s the engine underneath real-time meeting summarization and the reason Canary can show you what was just said during a live Zoom call.

Streaming transcription

Streaming vs batch transcription

How it fits together

Why it matters

Related terms & questions