How do bot-free meeting notes capture audio without a bot?

They tap the operating system's audio output locally. macOS exposes process audio taps via CoreAudio (Core Audio taps in macOS 14.4+); Windows offers WASAPI loopback capture of the render endpoint; Linux exposes a monitor source through PulseAudio or PipeWire. The app reads the audio you're already hearing — no bot joins the call and no virtual audio device is installed.

Is bot-free meeting capture more private?

It's more transparent about presence — nothing shows up in the participant list — and it can capture audio entirely on-device. But bot-free isn't a license for secrecy: you should still tell participants you're using a notetaker. Privacy is about consent and where the data goes, not just whether a bot is visible.

The complete guide to bot-free meeting notes

Bot-free meeting notes work by capturing your computer’s own audio output locally, instead of sending a bot to join the call. That one design choice changes everything downstream: nothing appears in the participant list, no plugin or browser extension is needed, no virtual audio device gets installed, and the audio can be processed on your machine. Here’s how it actually works under the hood.

What “bot-free” really means

Most meeting assistants — Otter, Fireflies, Fathom, tl;dv — work by dialing a bot into the call as a participant. The bot sits in the meeting like any attendee and records the mixed audio stream. It’s reliable, but it has obvious downsides: it shows up in the participant list, it can feel intrusive on client and sales calls, and it depends on the meeting platform letting bots in.

A bot-free tool skips all of that. It reads the audio your computer is already playing — the exact mix of everyone’s voices coming out of your speakers or into your headphones — directly from the operating system. No bot, no plugin, no virtual audio cable.

How system audio capture works, by platform

Reading “the sound the computer is playing” is a solved problem at the OS level, but each platform exposes it differently.

macOS — CoreAudio process taps

Modern macOS (14.4 Sonoma and later) exposes Core Audio taps: a supported API for tapping the audio of a process or the system mixdown. An app can register a tap on the audio output and receive the rendered stream — the same audio routed to your output device — without installing a kernel extension or a virtual device like the old Soundflower/BlackHole approach. The user grants permission, and the tap reads the live output buffer.

Windows — WASAPI loopback

Windows exposes the Windows Audio Session API (WASAPI) with a loopback capture mode. Instead of opening a microphone (a capture endpoint), the app opens the render endpoint in loopback mode and receives the audio being sent to the speakers. It’s the standard, documented way to record system output on Windows — no virtual audio device, no kernel driver.

Linux — PulseAudio / PipeWire monitor sources

On Linux, the sound server makes this easy: every output device (a “sink”) automatically has a corresponding monitor source. PulseAudio exposes a .monitor source for each sink; PipeWire offers the equivalent. Recording that monitor source captures exactly what’s being played, again with no extra virtual device.

In all three cases the principle is identical: read the output the user is already hearing, locally, through a supported OS API.

Why the architecture matters

The capture method isn’t just a technical footnote — it drives the user-facing benefits:

Nothing in the participant list. Because no bot joins, hosts and guests don’t see an extra attendee. This is the difference that matters on client calls.
No plugin or virtual device. Supported OS APIs mean no fragile audio cables or browser extensions to install and maintain.
Platform-agnostic. It captures any audio your computer plays — Zoom, Meet, Teams, a browser tab, a phone call routed through the Mac — because it taps the OS, not a specific app’s API.
On-device potential. Because the audio is read locally, capture doesn’t require shipping a bot through a third-party cloud to get into the room.

Bot-free is more transparent about presence — but it is not a tool for secret recording, and shouldn’t be framed that way. The right posture is consent: tell participants you use a notetaker, the same way you would with any tool. Bot-free’s real privacy advantage is architectural — capturing on-device and keeping audio off third-party servers where the design allows — not “nobody can tell.” Treat transparency as a feature, not something to route around.

Where Canary fits

Canary uses exactly this bot-free, local system-audio approach across macOS, Windows, and Linux — and then does something most bot-free tools don’t: it summarizes in real time. Instead of producing a document after the call, it shows a live, multi-resolution summary (now / last 2 min / last 5 min / full call) while the meeting is still happening. For the broader picture of why timing matters, see real-time vs post-meeting AI notes, and for the head-to-head with bot-based tools, Canary vs Fireflies.