我从[1]的链接下载了UrbanSound数据。它具有大约8732个声音文件,分为10折。链接上的说明建议对模型进行预定义的9折训练,对其余折数进行测试,并通过每次对不同折数进行测试来将过程重复10次。
我从每个文件中提取了11个其他特征(例如,过零率,能量等)和40个MFCC特征,就像其他许多文件所做的一样[2] [3]。这是代码段:
'''
# Approach 1 of splitting the data
# 'fold' names what fold (from 1-10) will be used for testing. All other folds will be used for training. For, e.g., train_fold1 represnt features extracted from nine folds: fold2 to fold10, and test_fold1 has features from fold1.
df_train = pd.read_csv('G:\\AudioFiles\\UrbanSound8K\\train_{}.csv'.format(fold, index = False))
df_train.fillna(0, inplace = True)
df_train.dropna(inplace = True)
df_train = df_train.iloc[:,:-1]
df_train = pd.get_dummies(df_train) # one-hot-encoding class variables
X_train = df_train.iloc[:, 12:-10] # taking only 40 MFCC features
y_train = df_train.iloc[:, -10:]
df_test = pd.read_csv('G:\\AudioFiles\\UrbanSound8K\\test_{}.csv'.format(fold, index_col = 0))
df_test = df_test.iloc[:,:-1]
df_test.dropna(inplace = True)
df_test = pd.get_dummies(df_test) # one-hot-encoding class variables
X_test = df_test.iloc[:, 12:-10] # taking only 40 MFCC features
y_test = df_test.iloc[:, -10:]
# Approach 2 of splitting the data
df = pd.concat([df_train, df_test])
df.dropna(inplace = True)
df = pd.get_dummies(df)
X,y = df.iloc[:, 12:-10].values, df.iloc[:, -10:].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1, random_state=13)
'''
我的问题是两种方法的结果截然不同。方法2在训练集上给出90-95%的准确度,在测试集上给出> 90%的准确度。当我使用方法1时,最大训练精度为70-90%,但测试精度仅为50-60%。
在将数据分为火车测试集的两种方法上,我都安装了几乎所有分类器(LR,SVC,KNN)和两到三层密集的简单中性网络。在两种方法下,所有分类器在准确性上都具有惊人的相似性。
'''
# I tried fitting each of these one by one.
clf = LogisticRegression(solver='lbfgs')
#clf = SVC()
#clf = KNeighborsClassifier()
#clf = LinearDiscriminantAnalysis()
#clf = GaussianNB()
clf.fit(X_train,y_train)
print(clf.score(X_train, y_train))
print(clf.score(X_test, y_test))
# code for neural network
model = Sequential()
model.add(Dense(256, activation='relu', input_dim=40))
model.add(Dropout(0.2))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='softmax'))
# Compile the model
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')
num_epochs = 50
num_batch_size = 32
model.fit(X_train, y_train, batch_size=num_batch_size, epochs=num_epochs, validation_data=(X_test, y_test), verbose=1)
'''
我不明白为什么我会变得如此与众不同。我没有发现模型拟合中的任何错误,因为我的拟合过程非常简单明了。我大部分时候只使用默认设置。如果我犯了其他错误,则两种方法都相同。
因此,这一定与火车测试拆分有关。随机改组比预定义的折叠划分所提供的结果要好得多。我尝试了使用不同种子值的10-20%的测试份额,但结果仍然相同。
参考
[1] https://urbansounddataset.weebly.com/urbansound8k.html
[3] https://medium.com/@mikesmales/sound-classification-using-deep-learning-8bc2aa1990b7