Transcribe audio in 4 steps:
- Install the free OpenVINO AI plugin for Audacity
- Import your recording and select the track or region
- Go to Analyze → OpenVINO Whisper Transcription
- Pick a Whisper model and language, then click Apply
What Is AI Transcription?
AI transcription turns spoken audio into time-stamped written text. Audacity's OpenVINO Whisper Transcription effect runs OpenAI's Whisper speech-recognition model entirely on your own computer — no uploads, no minute limits, no subscription. Feed it an interview, podcast, lecture, or voice memo and it writes every word into a label track you can edit, search, and export as SRT subtitles, VTT, or plain text. Because it runs offline, your audio never leaves your machine, making it a privacy-friendly alternative to cloud transcription services.
How to Transcribe Audio in Audacity
Step 1: Install the OpenVINO AI Plugin
Download and install the free OpenVINO AI plugin for Audacity from the official Audacity plugins page. The plugin adds a set of AI-powered effects and analysis tools, including Whisper Transcription.
Step 2: Import and Select Your Recording
Open your audio file with File → Open. Click and drag
to select the region you want to transcribe, or press Ctrl+A /
Cmd+A to select the whole track.
Step 3: Open Analyze → OpenVINO Whisper Transcription
Go to Analyze → OpenVINO Whisper Transcription. The dialog opens with model selection, language, and inference device options.
Step 4: Choose Model, Language and Apply
Select a Whisper model size (base, small, medium, or large), set the source language or leave it on auto-detect, then click Apply. The transcription is written to a new label track below your audio, with each phrase as a time-stamped label.
Transcription Settings Explained
Whisper Model (base / small / medium / large)
Choose the model size that trades speed for accuracy. Base is fastest and works well for clean English. Small and medium handle accents, noisy audio, and most non-English languages. Large (v1/v2/v3) is the most accurate. A special small.en-tdrz model adds experimental speaker diarization.
Mode (Transcribe vs Translate)
Transcribe keeps the spoken language in the output. Translate converts any of Whisper's 99 supported source languages into English text automatically — useful for subtitling foreign-language clips without a separate translation step.
Source Language
Defaults to auto-detect, which samples the first seconds of audio to guess the language. Pick a language manually for short clips, code-switching, or when auto-detect lands on the wrong one. Whisper supports 99 languages.
Inference Device (CPU / GPU / NPU)
Picks which chip runs the model. CPU works everywhere. GPU is faster on discrete or integrated graphics. NPU uses the neural accelerator on modern Intel Core Ultra laptops. Click Device Details to see what Audacity detected on your system.
Advanced Options (Initial Prompt, Max Segment Length, Beam Size)
Use Initial Prompt to steer spelling of names, jargon, or acronyms. Max Segment Length controls how long each label can be — shorter values help word-level editing and subtitle formatting. Beam Size improves accuracy at the cost of processing time.
Whisper Model Reference
| Model | Best For | Accuracy | Typical Speed (CPU) |
|---|---|---|---|
| base | Clean English, quick drafts | Good | ~0.3× audio length |
| small | Accents, noisier audio, many languages | Better | ~0.7× audio length |
| small.en-tdrz | English + speaker diarization | Better + speaker change | ~0.7× audio length |
| medium | Reliable multi-language transcription | High | ~1.5× audio length |
| large-v3 | Final subtitles, tough recordings | Highest | ~3–5× audio length |
Common Use Cases
- Podcast show notes — Generate an accurate episode transcript for blog posts and SEO
- Subtitles & captions — Export the label track as SRT or VTT for YouTube and video editors
- Interview notes — Search spoken quotes by text instead of scrubbing the waveform
- Lectures & meetings — Turn recordings into searchable, shareable notes
- Accessibility — Provide text alternatives for deaf and hard-of-hearing listeners
- Translation & localization — Translate foreign-language clips straight into English text
- Journalism & research — Keep sensitive recordings offline while still producing a text record
Tips for Best Results
- Start with base or small to preview speed, then re-run the final pass with a larger model
- Clean up the recording first with AI Noise Suppression — Whisper transcribes denoised speech far more accurately
- Select only the region you want to transcribe; Whisper processes whatever is selected
- Set the source language manually for short clips; auto-detect needs a few seconds of audio
- Use Initial Prompt to lock in proper spelling of names, brands, or technical jargon
- On Intel Core Ultra laptops, select the NPU device — it's faster and cooler than CPU for the larger models
- Export the label track via File → Export Other → Export Labels (SRT, VTT, or TXT)
Frequently Asked Questions
Is AI transcription in Audacity really free?
Yes. Audacity is free and open source, and the OpenVINO AI plugin that powers
Whisper Transcription is also free. There are no minute caps, no subscriptions,
and no watermarks — everything runs locally on your own PC.
Does Audacity audio transcription work offline?
Yes. Audacity runs Whisper entirely on your local machine. Your audio and transcript
never leave your computer — ideal for interviews, legal recordings, and
anything you'd rather not upload to the cloud.
How accurate is Whisper transcription in Audacity?
Very accurate for clean English on the medium and large models — comparable
to paid cloud tools. Noisy audio, strong accents, or overlapping speakers reduce
accuracy; denoise first and use a larger model for best results.
How many languages does Audacity transcription support?
Whisper supports 99 languages for transcription. In Translate mode, audio in any
of those languages can be converted directly into English text.
How do I export a transcript from Audacity as SRT or text?
Go to File → Export Other → Export Labels and pick
SRT, WebVTT, or plain text. SRT and VTT are ready to drop into YouTube, Premiere,
or DaVinci Resolve as subtitles.
Can Audacity identify different speakers in a recording?
Yes — experimentally. Pick the small.en-tdrz model to enable
speaker diarization. Audacity creates two label tracks and alternates labels when
it detects a speaker change.
Download Audacity Free
Ready to transcribe your audio offline? Download Audacity for free on Windows, macOS, or Linux.