Free AI Audio Transcription – Speech to Text

Shortcuts for:

Transcribe audio in 4 steps:

Install the free OpenVINO AI plugin for Audacity
Import your recording and select the track or region
Go to Analyze → OpenVINO Whisper Transcription
Pick a Whisper model and language, then click Apply

What Is AI Transcription?

AI transcription turns spoken audio into time-stamped written text. Audacity's OpenVINO Whisper Transcription effect runs OpenAI's Whisper speech-recognition model entirely on your own computer — no uploads, no minute limits, no subscription. Feed it an interview, podcast, lecture, or voice memo and it writes every word into a label track you can edit, search, and export as SRT subtitles, VTT, or plain text. Because it runs offline, your audio never leaves your machine, making it a privacy-friendly alternative to cloud transcription services.

How to Transcribe Audio in Audacity

Step 1: Install the OpenVINO AI Plugin

Download and install the free OpenVINO AI plugin for Audacity from the official Audacity plugins page. The plugin adds a set of AI-powered effects and analysis tools, including Whisper Transcription.

Step 2: Import and Select Your Recording

Open your audio file with File → Open. Click and drag to select the region you want to transcribe, or press Ctrl A ⌘A to select the whole track.

Step 3: Open Analyze → OpenVINO Whisper Transcription

Go to Analyze → OpenVINO Whisper Transcription. The dialog opens with model selection, language, and inference device options.

Step 4: Choose Model, Language and Apply

Select a Whisper model size (base, small, medium, or large), set the source language or leave it on auto-detect, then click Apply. The transcription is written to a new label track below your audio, with each phrase as a time-stamped label.

Transcription Settings Explained

Whisper Model (base / small / medium / large)

Choose the model size that trades speed for accuracy. Base is fastest and works well for clean English. Small and medium handle accents, noisy audio, and most non-English languages. Large (v1/v2/v3) is the most accurate. A special small.en-tdrz model adds experimental speaker diarization.

Mode (Transcribe vs Translate)

Transcribe keeps the spoken language in the output. Translate converts any of Whisper's 99 supported source languages into English text automatically — useful for subtitling foreign-language clips without a separate translation step.

Source Language

Defaults to auto-detect, which samples the first seconds of audio to guess the language. Pick a language manually for short clips, code-switching, or when auto-detect lands on the wrong one. Whisper supports 99 languages.

Inference Device (CPU / GPU / NPU)

Picks which chip runs the model. CPU works everywhere. GPU is faster on discrete or integrated graphics. NPU uses the neural accelerator on modern Intel Core Ultra laptops. Click Device Details to see what Audacity detected on your system.

Advanced Options (Initial Prompt, Max Segment Length, Beam Size)

Use Initial Prompt to steer spelling of names, jargon, or acronyms. Max Segment Length controls how long each label can be — shorter values help word-level editing and subtitle formatting. Beam Size improves accuracy at the cost of processing time.

Whisper Model Reference

Model	Best For	Accuracy	Typical Speed (CPU)
base	Clean English, quick drafts	Good	~0.3× audio length
small	Accents, noisier audio, many languages	Better	~0.7× audio length
small.en-tdrz	English + speaker diarization	Better + speaker change	~0.7× audio length
medium	Reliable multi-language transcription	High	~1.5× audio length
large-v3	Final subtitles, tough recordings	Highest	~3–5× audio length

Common Use Cases

Podcast show notes — Generate an accurate episode transcript for blog posts and SEO
Subtitles & captions — Export the label track as SRT or VTT for YouTube and video editors
Interview notes — Search spoken quotes by text instead of scrubbing the waveform
Lectures & meetings — Turn recordings into searchable, shareable notes
Accessibility — Provide text alternatives for deaf and hard-of-hearing listeners
Translation & localization — Translate foreign-language clips straight into English text
Journalism & research — Keep sensitive recordings offline while still producing a text record

Tips for Best Results

Start with base or small to preview speed, then re-run the final pass with a larger model
Clean up the recording first with AI Noise Suppression — Whisper transcribes denoised speech far more accurately
Select only the region you want to transcribe; Whisper processes whatever is selected
Set the source language manually for short clips; auto-detect needs a few seconds of audio
Use Initial Prompt to lock in proper spelling of names, brands, or technical jargon
On Intel Core Ultra laptops, select the NPU device — it's faster and cooler than CPU for the larger models
Export the label track via File → Export Other → Export Labels (SRT, VTT, or TXT)

Frequently Asked Questions

Is AI transcription in Audacity really free?
Yes. Audacity is free and open source, and the OpenVINO AI plugin that powers Whisper Transcription is also free. There are no minute caps, no subscriptions, and no watermarks — everything runs locally on your own PC.

Does Audacity audio transcription work offline?
Yes. Audacity runs Whisper entirely on your local machine. Your audio and transcript never leave your computer — ideal for interviews, legal recordings, and anything you'd rather not upload to the cloud.

How accurate is Whisper transcription in Audacity?
Very accurate for clean English on the medium and large models — comparable to paid cloud tools. Noisy audio, strong accents, or overlapping speakers reduce accuracy; denoise first and use a larger model for best results.

How many languages does Audacity transcription support?
Whisper supports 99 languages for transcription. In Translate mode, audio in any of those languages can be converted directly into English text.

How do I export a transcript from Audacity as SRT or text?
Go to File → Export Other → Export Labels and pick SRT, WebVTT, or plain text. SRT and VTT are ready to drop into YouTube, Premiere, or DaVinci Resolve as subtitles.

Can Audacity identify different speakers in a recording?
Yes — experimentally. Pick the small.en-tdrz model to enable speaker diarization. Audacity creates two label tracks and alternates labels when it detects a speaker change.

Download Audacity Free

Ready to transcribe your audio offline? Download Audacity for free on Windows, macOS, or Linux.

Download Audacity 3.7.8

Download without MuseHub Download without MuseHubOther versions Other versionsOther versions Other versions

Free AI Audio Transcription – Convert Speech to Text Offline with Whisper