Question

我正在尝试使用MFCC声音文件从.wav文件中提取功能。当我尝试将我的MFCC列表转换为numpy数组时，我收到错误。我很确定发生了这个错误，因为该列表包含具有不同形状的MFCC值（但我不确定如何解决该问题）。

我已经查看了其他2个stackoverflow帖子，但是这些不能解决我的问题，因为它们对于某个任务来说太具体了。

ValueError: could not broadcast input array from shape (128,128,3) into shape (128,128)

Value Error: could not broadcast input array from shape (857,3) into shape (857)

完整错误消息：

Traceback（最近一次调用最后一次）：文件＆＃34; /..../.../...../ Batch_MFCC_Data.py＆＃34 ;, 第68行，in X = np.array（MFCCs）ValueError：无法将形状（20,590）的输入数组广播为形状（20）

代码示例：

all_wav_paths = glob.glob('directory_of_wav_files/**/*.wav', recursive=True)
np.random.shuffle(all_wav_paths)

MFCCs = [] #array to hold all MFCC's
labels = [] #array to hold all labels

for i, wav_path in enumerate(all_wav_paths):

    individual_MFCC = MFCC_from_wav(wav_path)
    #MFCC_from_wav() -> returns the MFCC coefficients 

    label = get_class(wav_path)
    #get_class() -> returns the label of the wav file either 0 or 1

    #add features and label to the array
    MFCCs.append(individual_MFCC)
    labels.append(label)

#Must convert the training data to a Numpy Array for 
#train_test_split and saving to local drive

X = np.array(MFCCs) #THIS LINE CRASHES WITH ABOVE ERROR

# binary encode labels
onehot_encoder = OneHotEncoder(sparse=False)
Y = onehot_encoder.fit_transform(labels)

#create train/test data
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(MFCCs, Y, test_size=0.25, random_state=0)

#saving data to local drive
np.save("LABEL_SAVE_PATH", Y)
np.save("TRAINING_DATA_SAVE_PATH", X)

以下是MFCC阵列中MFCC的形状（来自.wav文件）的快照

MFCCs数组包含以下形状：

...More above...
(20, 423) #shape of returned MFCC from one of the .wav files
(20, 457)
(20, 1757)
(20, 345)
(20, 835)
(20, 345)
(20, 687)
(20, 774)
(20, 597)
(20, 719)
(20, 1195)
(20, 433)
(20, 728)
(20, 939)
(20, 345)
(20, 1112)
(20, 345)
(20, 591)
(20, 936)
(20, 1161)
....More below....

正如您所看到的，MFCC阵列中的MFCC不具有相同的形状，这是因为录制的时间长度并不相同。这就是我无法将数组转换为numpy数组的原因吗？如果这是问题，如何修复此问题以使整个MFCC阵列具有相同的形状？

任何用于完成此操作的代码段和建议都将非常感谢！

谢谢！

Answer 1

使用以下逻辑将数组下采样到min_shape，即将较大的数组减少到min_shape

min_shape = (20, 345)
MFCCs = [arr1, arr2, arr3, ...]    

for idx, arr in enumerate(MFCCs):
    MFCCs[idx] = arr[:, :min_shape[1]]

batch_arr = np.array(MFCCs)

然后您可以将这些数组堆叠在批处理数组中，如下面的最小示例所示：

In [33]: a1 = np.random.randn(2, 3)    
In [34]: a2 = np.random.randn(2, 5)    
In [35]: a3 = np.random.randn(2, 10)

In [36]: MFCCs = [a1, a2, a3]

In [37]: min_shape = (2, 2)

In [38]: for idx, arr in enumerate(MFCCs):
    ...:     MFCCs[idx] = arr[:, :min_shape[1]]
    ...:     

In [42]: batch_arr = np.array(MFCCs)

In [43]: batch_arr.shape
Out[43]: (3, 2, 2)

现在针对第二种策略，将数组较小的数组上采样到max_shape，遵循类似的逻辑，但是根据您的喜好用零或nan值填充缺失值

然后，您可以将数组堆叠为形状(num_arrays, dim1, dim2)的批处理数组;因此，对于您的情况，形状应为(num_wav_files, 20, max_column）

ValueError：无法将形状（20,590）的输入数组广播为形状（20）

1 个答案: