训练LSTM模型并获得良好的评估分数,但预测课堂分数却很差

时间:2020-07-03 19:37:52

标签: python machine-learning deep-learning conv-neural-network lstm

我是机器学习的新手,正在学习将数字数组分类为二进制分类器。我的输入是一个列表列表,其中每个列表代表一个400整数数组,而我的标签为0或1。因此,例如,模型应读取长度为400的数组,并应输出标签0或1.testX和testY是我保留的数据的20%,最后将在预测函数中使用。它们具有与master_ecog_label和master_ph_label相同的结构。

我已经使用LSTM训练了模型(正如人们向我推荐的那样),并且具有良好的验证准确性(0.99),但是当我使用model.predict_classes时,我的准确性下降到50%。我使用了imdb情感分类器作为灵感,并针对我的用例修改了代码。

这是我的代码(https://pastebin.com/CDqpr5Xm)。

master_ecog_label = np.array(master_ecog_label) # [ [0.02,0.03,0.01..],[..],[..]]
master_ph_label = np.array(master_ph_label) # [0,1,0,0,1,1,1,0,...]
testX = np.array(t_ecog_label) #same structure as master_ecog_label. to use in predict
testY = np.array(t_ph_label) #same structure as master_ph_label. to use in predict


kf = KFold(n_splits=5, shuffle=True, random_state=None)
for train_index, test_index in kf.split(master_ecog_label):
    #print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = master_ecog_label[train_index], master_ecog_label[test_index]
    Y_train, Y_test = master_ph_label[train_index], master_ph_label[test_index]
 
    max_ecog_samples = 400
    ecog_number = 1136  # 5684
    embedding_vecor_length = 32
    model = Sequential()
    model.add(Embedding(ecog_number, embedding_vecor_length, input_length=max_ecog_samples))
    model.add(Conv1D(filters=250, kernel_size=3, padding='same', activation='relu'))
    #model.add(Conv1D(filters=100, kernel_size=3, padding='same',kernel_initializer=RN(mean=0.0, stddev=0.02)))
    #model.add(PReLU(alpha_initializer=constant(value=0.25)))
    model.add(MaxPooling1D(pool_size=2))
    model.add(LSTM(20, return_sequences=True))
    model.add(Dropout(0.25))
    model.add(LSTM(20, return_sequences=True))
    model.add(Dropout(0.25))
    model.add(LSTM(20, return_sequences=False))
    model.add(Dropout(0.25))
    #model.add(Dropout(0.3))
    model.add(Dense(1, activation='sigmoid'))
 
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    print(model.summary())
    model.fit(X_train, Y_train, batch_size=128, validation_data=(X_test, Y_test), epochs=1)
 
    # Final evaluation of the model
    scores = model.evaluate(X_test, Y_test, verbose=1)
    print('Test loss:', scores[0])
    print('Test accuracy:', scores[1])
    print("Accuracy: %.2f%%" % (scores[1]*100))
 
    model.save('lstm_model.h5')
    model = load_model('lstm_model.h5')
    trainPredict = model.predict_classes(testX)


    flat_list = [item for sublist in trainPredict for item in sublist]
    flat_list = np.array(flat_list)
    # calculating matches
    i=0
    matches = 0
    while(i<len(flat_list)):
       if ( flat_list[i] == testY[i]):
           matches += 1
       i+= 1

    print("length of array = {}, number of correct = 
    {}".format(len(flat_list), matches))
    accuracy = ((matches)/len(flat_list)) * 100
    print("ACC = " + str(accuracy))
    

如果有人可以引导我并让我知道我要去哪里做错以及如何纠正它,我将不胜感激。到目前为止,我真的很喜欢这个项目,并且渴望学习并很好地完成它,因为我投入了大量时间和精力,并希望看到一些不错的结果。非常感谢:)

0 个答案:

没有答案