2024. 5. 18. 13:14ㆍAudio Signal Processing for ML
Time-domain features
First, we should implement ADC procedure, which converts analog signal to digital signal.
Next, there is a framing procedure, which is needed in order to generate perceivable audio chunks(note that our ear's time resolution is about 10ms)
It's noticeable that each frame overlaps with another. The reason will be revealed later.
After framing, feature computation and it's aggregation will be performed.
Frequency-domain features
The main pipeline is similar to' that of time-domain features.
The difference is that there is a windowing process.
When we perform short time fourier transform, a problem called "spectral leakage" occurs.
Spectral leakage is a problem that occurs when a soundwave is not a periodic(thus, discontinuities exist.)
When we calculate STFT of this kind of signal, the extracted frequencies will contain relatively high frequencies that are not included in the original signal.
To prevent this problem, we use something called "windowing", which is a function that is multiplied to the signal. The most common windowing function is Hann Window.
But this solution also makes another problem to arise, which is a loss of information in the original signal. This is why we overlap each frame to preserve information every single sample.
'Audio Signal Processing for ML' 카테고리의 다른 글
Time Domain features (0) | 2024.05.20 |
---|---|
ADC(Analog to Digital Conversion) (0) | 2024.05.18 |
Basic features of sound wave (0) | 2024.05.14 |
Audio Signal Processing for ML - Introduction (0) | 2024.05.12 |