计算wav文件的频谱图&录制声音(音量标准化)

时间:2013-08-31 21:06:15

标签: python audio signal-processing fft

我想以一致的方式将录制的音频与从磁盘读取的音频进行比较,但我遇到了音量标准化的问题(否则频谱图的幅度不同)。

我以前从未使用过信号,FFT或WAV格式,所以这对我来说是一个新的,未知的领域。我检索通道作为带符号16位整数的列表从44100赫兹采样

  1. 在磁盘上.wav文件
  2. 从我的笔记本电脑播放录制的音乐
  3. 然后我用一个窗口(2 ^ k)进行每次重叠。对于每个窗口,如:

    # calculate window variables
    window_step_size = int(self.window_size * (1.0 - self.window_overlap_ratio)) + 1
    last_frame = nframes - window_step_size # nframes is total number of frames from audio source
    num_windows, i = 0, 0 # calculate number of windows
    while i <= last_frame: 
        num_windows += 1
        i += window_step_size
    
    # allocate memory and initialize counter
    wi = 0 # index
    nfft = 2 ** self.nextpowof2(self.window_size) # size of FFT in 2^k
    fft2D = np.zeros((nfft/2 + 1, num_windows), dtype='c16') # 2d array for storing results
    
    # for each window
    count = 0
    times = np.zeros((1, num_windows)) # num_windows was calculated
    
    while wi <= last_frame:
    
        # channel_samples is simply list of signed ints
        window_samples = channel_samples[ wi : (wi + self.window_size)]
        window_samples = np.hamming(len(window_samples)) * window_samples 
    
        # calculate and reformat [[[[ THIS IS WHERE I'M UNSURE ]]]]
        fft = 2 * np.fft.rfft(window_samples, n=nfft) / nfft
        fft[0] = 0 # apparently these are completely real and should not be used
        fft[nfft/2] = 0 
        fft = np.sqrt(np.square(fft) / np.mean(fft)) # use RMS of data
        fft2D[:, count] = 10 * np.log10(np.absolute(fft))
    
        # sec / frame * frames = secs
        # get midpt
        times[0, count] = self.dt * wi
    
        wi += window_step_size
        count += 1
    
    # remove NaNs, infs
    whereAreNaNs = np.isnan(fft2D);
    fft2D[whereAreNaNs] = 0;
    whereAreInfs = np.isinf(fft2D);
    fft2D[whereAreInfs] = 0;
    
    # find the spectorgram peaks
    fft2D = fft2D.astype(np.float32)
    
    # the get_2D_peaks() method discretizes the fft2D periodogram array and then
    # finds peaks and filters out those peaks below the threshold supplied
    # 
    # the `amp_xxxx` variables are used for discretizing amplitude and the 
    # times array above is used to discretize the time into buckets
    local_maxima = self.get_2D_peaks(fft2D, self.amp_threshold, self.amp_max, self.amp_min, self.amp_step_size, times, self.dt)
    

    特别是,疯狂的东西(至少对我而言)发生在我的评论[[[[这是我不确定的地方]]]上。

    任何人都能指出我正确的方向或帮我生成音频频谱图,同时正确调整音量吗?

1 个答案:

答案 0 :(得分:1)

快速查看告诉我您忘记使用窗口,有必要计算您的频谱图。

你需要在“window_samples”

中使用一个Window(汉明,汉)

np.hamming(len(window_samples)) * window_samples

然后你可以计算rfft。

修改

#calc magnetitude from FFT
fftData=fft(windowed);
#Get Magnitude (linear scale) of first half values
Mag=abs(fftData(1:Chunk/2))
#if you want log scale R=20 * np.log10(Mag)
plot(Mag)
  

来自FFT的#calc RMS   RMS = np.sqrt((np.sum(np.abs(np.fft(data)** 2)/ len(data)))/(len(data)/ 2))

     

RMStoDb = 20 * log10(RMS)

PS:如果你想从FFT计算RMS你不能使用Window(Hann,Hamming),这条线是没有意义的:

fft = np.sqrt(np.square(fft) / np.mean(fft)) # use RMS of data

可以为每个窗口完成一个简单的规范化数据:

window_samples = channel_samples[ wi : (wi + self.window_size)]

#framMax=np.max(window_samples);
framMean=np.mean(window_samples);

Normalized=window_samples/framMean;