Hello all, this is Edward Hui from Hong Kong. I have a strong interest in audio/signal processing and Neuroscience, and I have been selected for the project “spectral editing tool” this summer, mentored by Paul Licameli. Here are links to my GitHub and LinkedIn profiles, please feel free to connect with me.
Background of spectral editing
Considering one of the most popular songs in history, Hey Jude by The Beatles as an example. I have fetched the song from YouTube as a wav file and imported it to Audacity (no CD or Vinyl magic happening), the snippet is attached here.
This is the original spectrogram, using the logarithmic scaling, window size of 4096, and band limited to [100 – 5000]Hz.
And two “ding” sounds were added as unwanted noises; at around 2900Hz, a modified snippet is attached here.
Using spectrogram view, the above noises are visualized and easily spotted by users, they are not blending much into the original mixing and their spectral energies are usually high. Common spectral editing examples include: removing unwanted doorbells during voice recording, eliminating coughing from the concert recording, those are common usage of noise removal for ordinary users.
In fact, there is built-in function for handling simple spectral editing, but it is strictly limited to straight line, making the editing not flexible enough to accommodate slightly more complicated noises with pitch variations, say the cat’s meow during voice recording as in the following graph.
The basic deliverable of the project
Brush tool will be introduced as the basic deliverable of this project, making spectral editing more user-friendly and effective, users can simply drag through the desired area, and regions with high spectral energy will be approximated and selected, like the following graph.
There are few challenges involved in this project
- The UI design of the tool and how should it be positioned in the existing toolbar, for better editing experience
- The data structure representing the brush and selected area, and which algorithm should we use to estimate the bounded points from continuous mouse position in real-time (most likely Bresenham’s algorithm or Midpoint circle algorithm, combined with Flood fill algorithm)
- The method of transforming the calculated area into the corresponding frequency components
- The combination of parameters for performing the Short-time Fourier transform and the inverse of it after the editing, i.e. window type, FFT size, and overlapping ratio etc.
The brush tool is expected to be completed and delivered before the first evaluation dated 12 July, one of the following features will be selected and developed according to the schedule.
1. Overtone selection
The aforementioned noises in real-life are similar to other audio, consisting of both fundamental frequency(F0) and overtone resonances; to effectively eliminate the unwanted noises, they should be all selected and removed.
It would be nice to approximate the overtones automatically from the F0, without users’ manual selection, and the threshold decision for such approximation is important.
2. Area re-selection
The area selected by the new tools can be adjusted using UI components like sliders, to decide the spectral energy threshold, for improving the editing experience.
This project aims to make spectral editing more widely accessible for all users regardless of their editing experience, the above features are hopefully to complete the spectral editing function of Audacity and empower more creative editing ideas.
Thanks to the Audacity team once again for accepting my proposal and I am looking forward to the coding stage! I will be writing weekly blogs during development and the links will also be updated here.