我知道有关Python中.wav文件的问题已被殴打致死,但由于没有人的答案似乎对我有用,我感到非常沮丧。我想做的事情对我来说似乎相对简单:我想确切地知道给定时间.wav文件中的频率。我想知道,例如,“从时间 n 毫秒到 n + 10 毫秒,声音的平均频率为 x 赫兹”。我见过一些人在谈论傅立叶变换和Goertzel算法以及各种模块,但我似乎无法弄清楚如何去完成我所描述的事情。我尝试查找诸如“在python中查找wav文件的频率”之类的内容大约二十次,但无济于事。有人可以帮我吗?
我正在寻找的是一种类似于此伪代码的解决方案,或者至少是一种可以实现类似伪代码的解决方案:
import some_module_that_can_help_me_do_this as freq
file = 'output.wav'
start_time = 1000 # Start 1000 milliseconds into the file
end_time = 1010 # End 10 milliseconds thereafter
print("Average frequency = " + str(freq.average(start_time, end_time)) + " hz")
请确保(我确定可以告诉我)我是数学的白痴。这是我的第一个问题,请保持柔和
答案 0 :(得分:1)
如果您想检测pitch的声音(似乎可以),那么就Python库而言,最好的选择是aubio。请参阅此example进行实施。
import sys
from aubio import source, pitch
win_s = 4096
hop_s = 512
s = source(your_file, samplerate, hop_s)
samplerate = s.samplerate
tolerance = 0.8
pitch_o = pitch("yin", win_s, hop_s, samplerate)
pitch_o.set_unit("midi")
pitch_o.set_tolerance(tolerance)
pitches = []
confidences = []
total_frames = 0
while True:
samples, read = s()
pitch = pitch_o(samples)[0]
pitches += [pitch]
confidence = pitch_o.get_confidence()
confidences += [confidence]
total_frames += read
if read < hop_s: break
print("Average frequency = " + str(np.array(pitches).mean()) + " hz")
请务必检查docs的音高检测方法。
我还认为您可能对平均频率和一些其他音频参数的估计感兴趣,而无需使用任何特殊的库。让我们只使用numpy!这将使您更好地了解如何计算此类音频功能。它基于specprop软件包中的seewave。检查文档中所计算功能的含义。
import numpy as np
def spectral_properties(y: np.ndarray, fs: int) -> dict:
spec = np.abs(np.fft.rfft(y))
freq = np.fft.rfftfreq(len(y), d=1 / fs)
spec = np.abs(spec)
amp = spec / spec.sum()
mean = (freq * amp).sum()
sd = np.sqrt(np.sum(amp * ((freq - mean) ** 2)))
amp_cumsum = np.cumsum(amp)
median = freq[len(amp_cumsum[amp_cumsum <= 0.5]) + 1]
mode = freq[amp.argmax()]
Q25 = freq[len(amp_cumsum[amp_cumsum <= 0.25]) + 1]
Q75 = freq[len(amp_cumsum[amp_cumsum <= 0.75]) + 1]
IQR = Q75 - Q25
z = amp - amp.mean()
w = amp.std()
skew = ((z ** 3).sum() / (len(spec) - 1)) / w ** 3
kurt = ((z ** 4).sum() / (len(spec) - 1)) / w ** 4
result_d = {
'mean': mean,
'sd': sd,
'median': median,
'mode': mode,
'Q25': Q25,
'Q75': Q75,
'IQR': IQR,
'skew': skew,
'kurt': kurt
}
return result_d
答案 1 :(得分:1)
我感到OP感到沮丧-如果有人需要,那么就很难找到如何获得频谱图的值,而不是看到频谱图的图像,
#!/usr/bin/env python
import librosa
import sys
import numpy as np
import matplotlib.pyplot as plt
import librosa.display
np.set_printoptions(threshold=sys.maxsize)
filename = 'filename.wav'
Fs = 44100
clip, sample_rate = librosa.load(filename, sr=Fs)
n_fft = 1024 # frame length
start = 0
hop_length=512
#commented out code to display Spectrogram
X = librosa.stft(clip, n_fft=n_fft, hop_length=hop_length)
#Xdb = librosa.amplitude_to_db(abs(X))
#plt.figure(figsize=(14, 5))
#librosa.display.specshow(Xdb, sr=Fs, x_axis='time', y_axis='hz')
#If to pring log of frequencies
#librosa.display.specshow(Xdb, sr=Fs, x_axis='time', y_axis='log')
#plt.colorbar()
#librosa.display.waveplot(clip, sr=Fs)
#plt.show()
#now print all values
t_samples = np.arange(clip.shape[0]) / Fs
t_frames = np.arange(X.shape[1]) * hop_length / Fs
#f_hertz = np.arange(N / 2 + 1) * Fs / N # Works only when N is even
f_hertz = np.fft.rfftfreq(n_fft, 1 / Fs) # Works also when N is odd
#example
print('Time (seconds) of last sample:', t_samples[-1])
print('Time (seconds) of last frame: ', t_frames[-1])
print('Frequency (Hz) of last bin: ', f_hertz[-1])
print('Time (seconds) :', len(t_samples))
#prints array of time frames
print('Time of frames (seconds) : ', t_frames)
#prints array of frequency bins
print('Frequency (Hz) : ', f_hertz)
print('Number of frames : ', len(t_frames))
print('Number of bins : ', len(f_hertz))
#This code is working to printout frame by frame intensity of each frequency
#on top line gives freq bins
curLine = 'Bins,'
for b in range(1, len(f_hertz)):
curLine += str(f_hertz[b]) + ','
print(curLine)
curLine = ''
for f in range(1, len(t_frames)):
curLine = str(t_frames[f]) + ','
for b in range(1, len(f_hertz)): #for each frame, we get list of bin values printed
curLine += str("%.02f" % np.abs(X[b, f])) + ','
#remove format of the float for full details if needed
#curLine += str(np.abs(X[b, f])) + ','
#print other useful info like phase of frequency bin b at frame f.
#curLine += str("%.02f" % np.angle(X[b, f])) + ','
print(curLine)
答案 2 :(得分:0)
尝试以下方法,它对我生成的频率为1234的正弦波文件有效 from this page.
from scipy.io import wavfile
def freq(file, start_time, end_time):
sample_rate, data = wavfile.read(file)
start_point = int(sample_rate * start_time / 1000)
end_point = int(sample_rate * end_time / 1000)
length = (end_time - start_time) / 1000
counter = 0
for i in range(start_point, end_point):
if data[i] < 0 and data[i+1] > 0:
counter += 1
return counter/length
freq("sin.wav", 1000 ,2100)
1231.8181818181818
已编辑:进行了一些循环清理
答案 3 :(得分:0)
这个答案已经很晚了,但你可以试试这个:
(注意:我不值得称赞,因为我从其他 SO 帖子和这篇关于使用 Python 的 FFT 的精彩文章中获得了大部分内容:https://realpython.com/python-scipy-fft/)
import numpy as np
from scipy.fft import *
from scipy.io import wavfile
def freq(file, start_time, end_time):
# Open the file and convert to mono
sr, data = wavfile.read(file)
if data.ndim > 1:
data = data[:, 0]
else:
pass
# Return a slice of the data from start_time to end_time
dataToRead = data[int(start_time * sr / 1000) : int(end_time * sr / 1000) + 1]
# Fourier Transform
N = len(dataToRead)
yf = rfft(dataToRead)
xf = rfftfreq(N, 1 / sr)
# Uncomment these to see the frequency spectrum as a plot
# plt.plot(xf, np.abs(yf))
# plt.show()
# Get the most dominant frequency and return it
idx = np.argmax(np.abs(yf))
freq = xf[idx]
return freq
此代码适用于任何 .wav
文件,但它可能会稍微偏离,因为它只返回最主要的频率,而且它只使用音频的第一个通道(如果不是单声道)。< /p>
如果您想了解有关傅立叶变换工作原理的更多信息,请观看 3blue1brown 制作的带有直观说明的视频:https://www.youtube.com/watch?v=spUNpyF58BY