我正在和Keras一起进行电影分类。
该文件被标记为[电影评论,情感(情感)]。
# MLP for the IMDB problem
import numpy
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
# load the dataset but only keep the top n words, zero the rest
top_words = 5000
# (X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=top_words)
X_train = train_result
y_train = train_label
X_test = test_result
y_test = test_label
max_words = 500
X_train = sequence.pad_sequences(X_train, maxlen=max_words)
X_test = sequence.pad_sequences(X_test, maxlen=max_words)
# create the model
model = Sequential()
model.add(Embedding(top_words, 32, input_length=max_words))
model.add(Flatten())
model.add(Dense(250, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
# Fit the model
hist = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=128, verbose=1)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=1)
print("Accuracy: %.2f%%" % (scores[1]*100))
%matplotlib inline
import matplotlib.pyplot as plt
fig, loss_ax = plt.subplots()
acc_ax = loss_ax.twinx()
loss_ax.plot(hist.history['loss'], 'y', label='train loss')
loss_ax.plot(hist.history['val_loss'], 'r', label='val loss')
acc_ax.plot(hist.history['acc'], 'b', label='train acc')
acc_ax.plot(hist.history['val_acc'], 'g', label='val acc')
loss_ax.set_xlabel('epoch')
loss_ax.set_ylabel('loss')
acc_ax.set_ylabel('accuray')
loss_ax.legend(loc='upper left')
acc_ax.legend(loc='lower left')
plt.show()
输出=始终精度100%
作为基本的深度学习模型, 如果启用了imdb批注,它将正常输出。
我认为文本预处理过程没有错。
我直接下载了imdb.csv文件,对其进行了处理,打印并显示了相同的图形
IMDB_Graph,MYData_Graph(非常奇怪的MYData图..)
供参考,my Csv file
我已将恐怖电影归类为恐怖或残酷电影。
[残酷,惊讶]我已经将其分类为标签。
我们可以坦白地解决吗?我一直对此表示怀疑。
我不知道数据本身是错误的还是代码是错误的。
你知道怎么了吗?
为什么总是显示100%的准确性?