Speaker diarization: figuring out who said what

Speaker diarization is the process of partitioning a recording by speaker — determining who spoke and when — so a transcript reads as a labeled back-and-forth rather than one undifferentiated block of text.

How it works

Diarization analyzes voice characteristics in the audio to group segments by who is speaking, then assigns labels like Speaker 1 and Speaker 2 (or real names, if known). It usually runs alongside streaming transcription and builds on voice activity detection, which first finds where speech occurs at all.

Why it matters

A summary or transcript is far more useful when it attributes statements to people — “Priya committed to the deadline” is more actionable than “someone said the deadline works.” Diarization is what makes attribution possible, and it improves the quality of any summary built on top, including the live views in Canary’s resolution matrix.

Speaker diarization

How it works

Why it matters

Related terms & questions