我试图使用webrtcvad来检测wav文件中的语音,为此我必须在32Khz时提供30ms的16bit pcm。 我试图做的是在30ms的数据包中剪切我的wav但是以此文件为例(https://dl.dropboxusercontent.com/u/91396766/recording000001.wav):
audio software :
5,568 seconds, 16 bit, mono, 320000Hz
-
https://docs.python.org/3/library/wave.html
w.getnframes() = 178176
w.getframerate() = 32000
w.getnchannels() = 1 = mono
w.getsampwidth() = 2 byte = 16 bits
len(w.readframes(w.getnframes())) = 356352, shouldn't it be 178176?
len(w.readframes(0)) = 1
为什么len(w.readframes(w.getnframes()))= 356352,它应该是178176因为
1/32000 = 0.00003125秒 和0.00003125 * 175176 = 5.568秒
由于
测试脚本:
import wave
infile = 'recording000001.wav'
w = wave.open(infile, 'rb')
data = w.readframes(w.getnframes())
frequency = w.getframerate()
number_of_channels = w.getnchannels()
sample_width_in_bytes = w.getsampwidth()
print "{} is sampled at {}Hz, it has {} channel(s) and a sample width of {} bytes".format(infile, frequency, number_of_channels, sample_width_in_bytes)
print "it contains {} data".format(len(data))
print "for {} frames".format(w.getnframes())
print "one data length is {}".format(len(data[0]))
w.close()
输出:
recording000001.wav is sampled at 32000Hz, it has 1 channel(s) and a sample width of 2 bytes
it contains 356352 data
for 178176 frames
one data length is 1
答案 0 :(得分:1)
之后
w.rewind()
我试过
LEN(w.readframes(0))
0
与您的结果不同,您获得1。
有趣的是,
LEN(w.readframes(1))
2
但这是有道理的,因为确实有2个字节。 (16位音频帧意味着每帧有2个字节。由于你在二进制对象上调用len,因此我认为它会返回实际字节数。)
如果你想处理你的音频数据,也许你应该调查像numpy这样的库来进行进一步的分析或处理你的音频。
将numpy导入为np
c = np.frombuffer(w.readframes(w.getnframes()),dtype =“int16”)
c.shape
(178175)
c [0] = 100
c [1] = 122
c [100] = -132
这是原始波形数据。它在(-2 ^ 15,2 ^ 15)或-32768到32,768的范围内。由于音频文件的第一部分在开始时很安静,因此前几百帧中的小值很有意义。