我正在尝试将从wav检索到的数据划分为10ms段以进行动态时间扭曲。
import wave
import contextlib
data = np.zeros((1, 7000))
rate, wav_data = wavfile.read(file_path)
with contextlib.closing(wave.open(file_path, 'r')) as f:
frames = f.getnframes()
rate = f.getframerate()
duration = frames / float(rate)
是否有任何现有的库
由于
答案 0 :(得分:0)
如果您对数据的后期处理感兴趣,您可能会将其作为numpy数据使用。
>>> import wave
>>> import numpy as np
>>> f = wave.open('911.wav', 'r')
>>> data = f.readframes(f.getnframes())
>>> data[:10] # just to show it is a string of bytes
'"5AMj\x88\x97\xa6\xc0\xc9'
>>> numeric_data = np.fromstring(data, dtype=np.uint8)
>>> numeric_data
array([ 34, 53, 65, ..., 128, 128, 128], dtype=uint8)
>>> 10e-3*f.getframerate() # how many frames per 10ms?
110.25
这不是整数,所以除非你要插入数据,否则你需要用零填充数据以得到110帧长的样本(在这个帧速率下大约10ms)。
>>> numeric_data.shape, f.getnframes() # there are just as many samples in the numpy array as there were frames
((186816,), 186816)
>>> padding_length = 110 - numeric_data.shape[0]%110
>>> padded = np.hstack((numeric_data, np.zeros(padding_length)))
>>> segments = padded.reshape(-1, 110)
>>> segments
array([[ 34., 53., 65., ..., 216., 222., 228.],
[ 230., 227., 224., ..., 72., 61., 45.],
[ 34., 33., 32., ..., 147., 158., 176.],
...,
[ 128., 128., 128., ..., 128., 128., 128.],
[ 127., 128., 128., ..., 128., 129., 129.],
[ 129., 129., 128., ..., 0., 0., 0.]])
>>> segments.shape
(1699, 110)
现在,segments
数组的每一行长约10毫秒。