我试图找出声音片段的平均频率以及Q25和Q75值,但是遇到了问题(主要是由于我缺乏数学和DSP知识)。
我无法使用this答案,并且遇到了将该答案中的代码与读取.wav文件相结合的问题。
这是我用来记录的代码...
def record_sample(file):
# Audio Recording
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 5
pa = pyaudio.PyAudio()
# Record sample.
stream = pa.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=CHUNK)
frames = []
for _ in range(int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
stream.stop_stream()
stream.close()
# Save to wave file.
wf = wave.open(file, "wb")
wf.setnchannels(CHANNELS)
wf.setsampwidth(pa.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
一切正常。这是我用于计算平均频率Q25和Q75的代码...
def spectral_properties(file):
# Note: scipy.io.wavfile.read
fs, data = wavfile.read(file)
spec = np.abs(np.fft.rfft(data))
freq = np.fft.rfftfreq(len(data), d=1 / fs)
spec = np.abs(spec)
amp = spec / spec.sum()
amp_cumsum = np.cumsum(amp)
Q25 = freq[len(amp_cumsum[amp_cumsum <= 0.25]) + 1]
Q75 = freq[len(amp_cumsum[amp_cumsum <= 0.75]) + 1]
print((freq * amp).sum(), Q25, Q75)
及其产生的错误...
File "/home/horner/workspace/school/ML/machine-learning-project-mdx97/program/audio.py", line 65, in spectral_properties
Q75 = freq[len(amp_cumsum[amp_cumsum <= 0.75]) + 1]
IndexError: index 298981 is out of bounds for axis 0 with size 110081
答案 0 :(得分:0)
请注意,您有2个频道,这意味着您将获得二维data
。您当前的版本只是在某些操作期间使数组变平,这就是为什么它似乎包含太多元素的原因。
有两种方法可以解决此问题。首先是只使用其中一个渠道:
def spectral_properties(filename):
fs, data = wavfile.read(filename)
# use the first channel only
if data.ndim > 1:
data = data[:, 0]
spec = np.abs(np.fft.rfft(data))
freq = np.fft.rfftfreq(len(data), d=1/fs)
assert len(spec) == len(freq)
amp = spec / spec.sum()
amp_cumsum = amp.cumsum()
assert len(amp_cumsum) == len(freq)
q25 = freq[len(amp_cumsum[amp_cumsum < 0.25])]
q75 = freq[len(amp_cumsum[amp_cumsum < 0.75])]
return (freq * amp).sum(), q25, q75
avg, q25, q75 = spectral_properties('foobar.wav')
print(avg, q25, q75)
第二个是保留通道并告诉numpy函数应该沿哪个轴运行。这也意味着计算四分位数变得不那么琐碎了,因为您需要为每个通道分别查找它们,但是由于Python的列表理解,它看起来和以前一样简单:
def spectral_properties(filename):
fs, data = wavfile.read(filename)
# determine number of channels
num_channels = data.shape[1]
spec = np.abs(np.fft.rfft(data, axis=0))
freq = np.fft.rfftfreq(len(data), d=1/fs)
assert len(spec) == len(freq)
amp = spec / spec.sum(axis=0)
amp_cumsum = amp.cumsum(axis=0)
assert len(amp_cumsum) == len(freq)
q25 = [freq[len(amp_cumsum[:,j][amp_cumsum[:,j] < 0.25])] for j in range(num_channels)]
q75 = [freq[len(amp_cumsum[:,j][amp_cumsum[:,j] < 0.75])] for j in range(num_channels)]
return (freq[:,np.newaxis] * amp).sum(axis=0), q25, q75
avg, q25, q75 = spectral_properties('foobar.wav')
print(avg, q25, q75)
请注意,+ 1
在您的原始表达式中存在四分位数的问题。考虑除最后一个以外的所有值均小于0.25
。因此,不平等对于n - 1
元素将成立。您add 1
,因此您得到n
。但是对于长度为n
的{{1}}数组,freq
的索引太大。
此外,我怀疑您可能想对n
求平方,而不是将其保持不变。
更新:
您可能还想使用spec
查找四分位数,该四分位数应该更快,更容易阅读:
searchsorted
并且:
q25 = freq[np.searchsorted(amp_cumsum, 0.25)]
q75 = freq[np.searchsorted(amp_cumsum, 0.75)]