将梅尔频谱图归一化为单位峰值幅度?

时间:2019-01-30 02:19:40

标签: python signal-processing spectrogram mfcc librosa

我是python和librosa的新手。我正在尝试将这种方法用于语音识别器:acoustic front end

我的代码:

import librosa
import librosa.display
import numpy as np

y, sr = librosa.load('test.wav', sr = None)
normalizedy = librosa.util.normalize(y)

stft = librosa.core.stft(normalizedy, n_fft = 256, hop_length=16)
mel = librosa.feature.melspectrogram(S=stft, n_mels=32)
melnormalized = librosa.util.normalize(mel)
mellog = np.log(melnormalized) - np.log(10**-5)

问题是,当我将librosa.util.normalize应用于变量mel时,我期望值在1到-1之间,但不是。我想念什么?

1 个答案:

答案 0 :(得分:1)

如果您希望对输出进行对数缩放并将其标准化为-1和-1之间,则应先对数缩放,然后再进行标准化:

import librosa
import librosa.display
import numpy as np

y, sr = librosa.load('test.wav', sr = None)
normalizedy = librosa.util.normalize(y)

stft = librosa.core.stft(normalizedy, n_fft = 256, hop_length=16)
mel = librosa.feature.melspectrogram(S=stft, n_mels=32)
mellog = np.log(mel + 1e-9)
melnormalized = librosa.util.normalize(mellog)
# use melnormalized