Sanitizing speech recordings made with portable audio recorders

Processing speech recordings in Audacity
I have just started toying with mobile voice recordings including personal musings, presentations at work or events or a course I am taking, and generally speech recordings of any kind. The key aspects of this recorded speech are little control over the recording environment, and use of a variety of recording tools, many of which are likely to be less than ideal quality for perfect speech recording.

For the personal notes, I have control over the environment and intelligence-to-noise ratio, and can use Audacity on the PC with a fairly good microphone to record the data.

However, in the other environments where the use of portable recording devices is required, I am limited in this regard. I cannot pre-arrange for decent recording equipment most of the time, and to stand next to the presenter/lecturer with my Zen in order to get better quality voice recording is not an option. :)

General low volume audio, poor intelligence-to-noise ratio, and distracting high pitch 'hiss' in the noise are just some of the problems I am experiencing.

I have today for the first time begun looking for good audio processing tools. After playing with a few I am beginning to settle on Audacity.

However, I am at a severe disadvantage in having no idea of the best practices for voice processing in this fashion.

I will attempt to outline what I have 'learned' so far, and where I think more info would be great for people in general recording speech sound bytes for re-distribution or private records etc. (Speech/Podcasts)

My Process to Date

1) My portable recorder (like most portables I imagine), has a limited mic quality and gain . It generates WAV Files, which I import into.

2) This is a stereo, 16kHz file, imported as Audacity does by default as 32-bit float sample format (from the information box on the left of the audacity window).

3) I then immediately save the import as an Audacity project, copying the source wav into the audacity project. I then remove the source WAV file which is now redundant (though I still have a copy of this original info on a backup medium until I am happy with the processing of the data).

4) I select the envelope tool (the 'pinchy' button) and 'widen' it, in preparation for normalization (to get better normalization, or increase in volume).

5) Next I select all the data and perform normalization (the audio level of the presenter at a distance of a few meters is soft, so this increases the audio level)

6) Unfortunately, the background noise level was close to the sound level of the speaker most of the time. So, the noise has now also increased in volume and is still prevalent. I select 'quiet' pieces between words/sentences and use them as 'noise training data' with the "Get Noise Profile" option of the noise removal 'effect' (Effect|Noise Removal...|Get Noise Profile).

7) Armed with this profile, I select data for a range to either side of this training data, and apply the noise filter. I repeat for the entire data.

8) I revisit the file/project searching for gaps in the speech, noises that are not intelligence, etc. and silence/cut them.

9) I have also applied the 'Click Removal' effect, but I am not sure it has done much.

10) My audio recorder (a zen vision:m portable has a drive in it which spins up a various intervals. Then there is a 'click' and the spin 'whine' has stopped. I presume this is just writing the buffer to disk, but hearing the disk in the recording is annoying, especially with low volume speakers. I am manually looking for these points (which, like people coughing, is given away by a quick 'spike' in the waveform representation), and silence or remove them. Is there a better way to deal with these ???

11) There is some 'hiss' and other high pitch noise in the background, how can I get rid of this ?

Areas requiring further attention (?)

1) Some more information specific to speech processing, especially when the recording is -not- made with Audacity, but rather with average or poor quality portable recorders.

2) I have read articles hinting at unrequired frequencies or frequency bands, bass mostly. I need to look deeper into this. What are they ? How can one apply them in Audacity ? Are there dangers in losing intelligence ?

3) How to use 'filters' to remove data outside the range of the human voice. Can a specific speaker be profiled via a short speech segment, and use this as a mask to remove everything else ?

4) What are good audio file settings for speech ? I am referring specifically to sampling sensitivity and rate, stereo/mono, etc. How can one easily re-sample an audio file with speech only in order to make it a smaller file yet with all the intelligence still intact ?

I am sure I am missing loads more good tricks for processing speech data. So please chip in. This is a cool application, good info on accomplishing various tasks with Audacity would be a great help to me, and lots of others from what I am reading online.

With the advent of the multitude of portable digital audio players and recorders and thus the desire to produce general SpeechCasts, we have entered a new era where the general public, not knowing about the Signal Processing tricks required to sanitise speech well, will want to do just that.

Any tutorials, automation, etc would be very very helpful in my opinion.

I will report back if I find anything else of interest. :)

A simple two-step process taking a minute
Contributed by User:Dustspeckle

I just got a bunch of iRiver T60 1 GB mp3-players for my students to record interviews. The objective was to get the simplest and cheapest possible gear that work for this purpose. At the $30 they cost, I did not expect wonders. I decided on this one hoping it would resemble the performance of my personal old iRiver iFP-799. The T60 disappointed quite a lot in this respect, there seems to be no AGC and also the microphone appears to have a very high sensitivity for low-frequency sounds like heavy trucks passing outside, etc.

At the "high" voice recording setting it records with 128 kbps, giving 20 - 15000 Hz with the internal mic. That gives quite a lot to work with!

Entering Audacity for post-processing.

First, open the file with the interview and make a selection of what part of the file you'd like to process, like the entire file for instance.
 * Apply the "high pass filter" in the effects menu. Use like 150 Hz cutoff frequency to do away with the bass that tend to saturate the recording.
 * Then use the "normalizer" in the effects menu. If there are very faint segments in the recording, use the "compressor" instead to enhance those weak parts before the normalizing process that is included if that box is checked in the compressor dialog window.

Voila! You get a very good recording that enhances the spoken information. Now the results are almost comparable to what I got from my beloved iRiver iFP-799 in the original recordings already, that is very good.

Then just export the result to an mp3 file, and listen to it using your favorite mp3-players' equalizer to fine-tune the spectrum for max clarity to your ears.

The suppression of spikes and low-frequency noise in the process

The inexpensive iRiver T60 has no problems with hiss, luckily, partly because it is flash-based. If it had such noise problems, the ideas about using bandpass/notch filters seems very good (see the discussion page).

With the T60, the main problem is that extremely low frequencies are picked up and reinforced by the internal microphone and tend to leave rather little dynamic headroom for the useful parts of the spectrum in voice recordings. The application of the high-pass filter very effectively dampens this kind of noise and prepares the recording for further processing.

Like you, I had no success with the "spike killer"-effect, maybe I didn't understand how to use the parameters. In any case, the use of the compressor effect - rather than the plain normalizer - will ameliorate the amplitude of the spikes and enhance the information-carrying parts of the voice recording at the same time. Before doing that, removing the bass as we did above will greatly enhance the effectiveness of the compression to this end since spikes often have a significant low-frequency content that tend to saturate the recording. Mark a spike and check Analyze-plot spectrum, you'll see what I mean. Make sure to normalize the recording level after the compressor stage, it is normally included in the compressor dialog by a checked box.