我正在处理wav文件以通过FFT进行幅度和频率分析,但是我无法以时间序列格式将数据输出到csv。
在此帖子中,大量使用@Beginner的答案:How to convert a .wav file to a spectrogram in python3,我能够在图像中获得频谱图输出。我正在尝试稍微简化一下以获取csv格式的文本输出,但是我没有看到如何做。我希望实现的结果如下所示:
time_in_ms,振幅in_dB,freq_in_kHz
.001,-115、1
.002,-110、2
.003、20、200
...
19000、20、200
在测试中,我一直使用http://soundbible.com/2123-40-Smith-Wesson-8x.html,(注意:我将wav简化为单个通道,并删除了带有Audacity的元数据以使其正常工作。)
@Beginner的大量道具占99.9%以下,毫无意义的东西肯定是我的。
import numpy as np
from matplotlib import pyplot as plt
import scipy.io.wavfile as wav
from numpy.lib import stride_tricks
filepath = "40sw3.wav"
""" short time fourier transform of audio signal """
def stft(sig, frameSize, overlapFac=0.5, window=np.hanning):
win = window(frameSize)
hopSize = int(frameSize - np.floor(overlapFac * frameSize))
# zeros at beginning (thus center of 1st window should be for sample nr. 0)
samples = np.append(np.zeros(int(np.floor(frameSize/2.0))), sig)
# cols for windowing
cols = np.ceil( (len(samples) - frameSize) / float(hopSize)) + 1
# zeros at end (thus samples can be fully covered by frames)
samples = np.append(samples, np.zeros(frameSize))
frames = stride_tricks.as_strided(samples, shape=(int(cols), frameSize), strides=(samples.strides[0]*hopSize, samples.strides[0])).copy()
frames *= win
return np.fft.rfft(frames)
""" scale frequency axis logarithmically """
def logscale_spec(spec, sr=44100, factor=20.):
timebins, freqbins = np.shape(spec)
scale = np.linspace(0, 1, freqbins) ** factor
scale *= (freqbins-1)/max(scale)
scale = np.unique(np.round(scale))
# create spectrogram with new freq bins
newspec = np.complex128(np.zeros([timebins, len(scale)]))
for i in range(0, len(scale)):
if i == len(scale)-1:
newspec[:,i] = np.sum(spec[:,int(scale[i]):], axis=1)
else:
newspec[:,i] = np.sum(spec[:,int(scale[i]):int(scale[i+1])], axis=1)
# list center freq of bins
allfreqs = np.abs(np.fft.fftfreq(freqbins*2, 1./sr)[:freqbins+1])
freqs = []
for i in range(0, len(scale)):
if i == len(scale)-1:
freqs += [np.mean(allfreqs[int(scale[i]):])]
else:
freqs += [np.mean(allfreqs[int(scale[i]):int(scale[i+1])])]
return newspec, freqs
""" compute spectrogram """
def compute_stft(audiopath, binsize=2**10):
samplerate, samples = wav.read(audiopath)
s = stft(samples, binsize)
sshow, freq = logscale_spec(s, factor=1.0, sr=samplerate)
ims = 20.*np.log10(np.abs(sshow)/10e-6) # amplitude to decibel
return ims, samples, samplerate, freq
""" plot spectrogram """
def plot_stft(ims, samples, samplerate, freq, binsize=2**10, plotpath=None, colormap="jet"):
timebins, freqbins = np.shape(ims)
plt.figure(figsize=(15, 7.5))
plt.imshow(np.transpose(ims), origin="lower", aspect="auto", cmap=colormap, interpolation="none")
plt.colorbar()
plt.xlabel("time (s)")
plt.ylabel("frequency (hz)")
plt.xlim([0, timebins-1])
plt.ylim([0, freqbins])
xlocs = np.float32(np.linspace(0, timebins-1, 5))
plt.xticks(xlocs, ["%.02f" % l for l in ((xlocs*len(samples)/timebins)+(0.5*binsize))/samplerate])
ylocs = np.int16(np.round(np.linspace(0, freqbins-1, 10)))
plt.yticks(ylocs, ["%.02f" % freq[i] for i in ylocs])
if plotpath:
plt.savefig(plotpath, bbox_inches="tight")
else:
plt.show()
plt.clf()
"" HERE IS WHERE I'm ATTEMPTING TO GET IT OUT TO TXT """
ims, samples, samplerate, freq = compute_stft(filepath)
""" Print lengths """
print('ims len:', len(ims))
print('samples len:', len(samples))
print('samplerate:', samplerate)
print('freq len:', len(freq))
""" Write values to files """
np.savetxt(filepath + '-ims.txt', ims, delimiter=', ', newline='\n', header='ims')
np.savetxt(filepath + '-samples.txt', samples, delimiter=', ', newline='\n', header='samples')
np.savetxt(filepath + '-frequencies.txt', freq, delimiter=', ', newline='\n', header='frequencies')
就输出值而言,我正在分析的文件长约19.1秒,采样率是44100,因此我希望任何给定变量的值都约为842k。但是我没有看到我的期望。相反,这是我看到的:
freqs仅包含少数几个值512,尽管它们似乎是预期频率的正确范围,但它们的排列顺序从最小到最大,而不是我期望的时间顺序。我认为512个值是FFT中的“快速”值,基本上是下采样的...
ims似乎是振幅,但值似乎太高,尽管样本大小正确。应该看到-50高达〜240dB。
样本。 。 。不确定。
简而言之,有人可以建议我如何将FFT转换为带有整个样本集的时间,安培和频率值的文本文件吗? savetxt是正确的路由,还是有更好的方法?可以肯定地使用此代码来制作一个很棒的频谱图,但是我如何才能取出数据呢?
答案 0 :(得分:0)
您的输出格式过于局限,因为在任何时间间隔的音频频谱通常都包含一个频率范围。例如,1024个样本的FFT将包含一个时间窗或时间步长的512个频点,每个频点都有一个幅度。如果您想要一个1毫秒的时间步长,那么您将必须偏移向每个STFT馈送的样本窗口,以使该窗口在样本矢量中的该点居中。尽管FFT的长度约为23毫秒,但这将导致窗口的高度重叠。您可以使用更短的窗口,但是时频权衡会导致频率分辨率成比例地降低。