我是机器学习的新手,正在学习将数字数组分类为二进制分类器。我的输入是一个列表列表,其中每个列表代表一个400整数数组,而我的标签为0或1。因此,例如,模型应读取长度为400的数组,并应输出标签0或1.testX和testY是我保留的数据的20%,最后将在预测函数中使用。它们具有与master_ecog_label和master_ph_label相同的结构。
我已经使用LSTM训练了模型(正如人们向我推荐的那样),并且具有良好的验证准确性(0.99),但是当我使用model.predict_classes时,我的准确性下降到50%。我使用了imdb情感分类器作为灵感,并针对我的用例修改了代码。
这是我的代码(https://pastebin.com/CDqpr5Xm)。
master_ecog_label = np.array(master_ecog_label) # [ [0.02,0.03,0.01..],[..],[..]]
master_ph_label = np.array(master_ph_label) # [0,1,0,0,1,1,1,0,...]
testX = np.array(t_ecog_label) #same structure as master_ecog_label. to use in predict
testY = np.array(t_ph_label) #same structure as master_ph_label. to use in predict
kf = KFold(n_splits=5, shuffle=True, random_state=None)
for train_index, test_index in kf.split(master_ecog_label):
#print("TRAIN:", train_index, "TEST:", test_index)
X_train, X_test = master_ecog_label[train_index], master_ecog_label[test_index]
Y_train, Y_test = master_ph_label[train_index], master_ph_label[test_index]
max_ecog_samples = 400
ecog_number = 1136 # 5684
embedding_vecor_length = 32
model = Sequential()
model.add(Embedding(ecog_number, embedding_vecor_length, input_length=max_ecog_samples))
model.add(Conv1D(filters=250, kernel_size=3, padding='same', activation='relu'))
#model.add(Conv1D(filters=100, kernel_size=3, padding='same',kernel_initializer=RN(mean=0.0, stddev=0.02)))
#model.add(PReLU(alpha_initializer=constant(value=0.25)))
model.add(MaxPooling1D(pool_size=2))
model.add(LSTM(20, return_sequences=True))
model.add(Dropout(0.25))
model.add(LSTM(20, return_sequences=True))
model.add(Dropout(0.25))
model.add(LSTM(20, return_sequences=False))
model.add(Dropout(0.25))
#model.add(Dropout(0.3))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(X_train, Y_train, batch_size=128, validation_data=(X_test, Y_test), epochs=1)
# Final evaluation of the model
scores = model.evaluate(X_test, Y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])
print("Accuracy: %.2f%%" % (scores[1]*100))
model.save('lstm_model.h5')
model = load_model('lstm_model.h5')
trainPredict = model.predict_classes(testX)
flat_list = [item for sublist in trainPredict for item in sublist]
flat_list = np.array(flat_list)
# calculating matches
i=0
matches = 0
while(i<len(flat_list)):
if ( flat_list[i] == testY[i]):
matches += 1
i+= 1
print("length of array = {}, number of correct =
{}".format(len(flat_list), matches))
accuracy = ((matches)/len(flat_list)) * 100
print("ACC = " + str(accuracy))
如果有人可以引导我并让我知道我要去哪里做错以及如何纠正它,我将不胜感激。到目前为止,我真的很喜欢这个项目,并且渴望学习并很好地完成它,因为我投入了大量时间和精力,并希望看到一些不错的结果。非常感谢:)