Question

我正在尝试将从wav检索到的数据划分为10ms段以进行动态时间扭曲。

    import wave
    import contextlib

    data = np.zeros((1, 7000))
    rate, wav_data = wavfile.read(file_path)
    with contextlib.closing(wave.open(file_path, 'r')) as f:
        frames = f.getnframes()
        rate = f.getframerate()
        duration = frames / float(rate)

是否有任何现有的库

由于

Answer 1

如果您对数据的后期处理感兴趣，您可能会将其作为numpy数据使用。

>>> import wave
>>> import numpy as np
>>> f = wave.open('911.wav', 'r')
>>> data = f.readframes(f.getnframes())
>>> data[:10]  # just to show it is a string of bytes
'"5AMj\x88\x97\xa6\xc0\xc9'
>>> numeric_data = np.fromstring(data, dtype=np.uint8)
>>> numeric_data
array([ 34,  53,  65, ..., 128, 128, 128], dtype=uint8)
>>> 10e-3*f.getframerate()  # how many frames per 10ms?
110.25

这不是整数，所以除非你要插入数据，否则你需要用零填充数据以得到110帧长的样本（在这个帧速率下大约10ms）。

>>> numeric_data.shape, f.getnframes()  # there are just as many samples in the numpy array as there were frames
((186816,), 186816)
>>> padding_length = 110 - numeric_data.shape[0]%110 
>>> padded = np.hstack((numeric_data, np.zeros(padding_length)))
>>> segments = padded.reshape(-1, 110)
>>> segments
array([[  34.,   53.,   65., ...,  216.,  222.,  228.],
       [ 230.,  227.,  224., ...,   72.,   61.,   45.],
       [  34.,   33.,   32., ...,  147.,  158.,  176.],
       ..., 
       [ 128.,  128.,  128., ...,  128.,  128.,  128.],
       [ 127.,  128.,  128., ...,  128.,  129.,  129.],
       [ 129.,  129.,  128., ...,    0.,    0.,    0.]])
>>> segments.shape
(1699, 110)

现在，segments数组的每一行长约10毫秒。

如何将wav文件切换成10ms数据

1 个答案: