Question

我认为代码会说明一切，但我训练了一个模型，我现在想用它来预测一些新的输入数据。新的输入数据似乎是错误的维度。您可以在下面看到模型和预测（尝试）

的代码和错误消息

tokenizer = Tokenizer(num_words=10000)

df = pd.read_csv('/home/paperspace/Sentiment Analysis Dataset.csv', index_col = 0,
                 error_bad_lines = False)

y = list(df['Sentiment'])

tokenizer.fit_on_texts(list(df['SentimentText']))
X = tokenizer.texts_to_sequences(list(df['SentimentText']))
X = pad_sequences(X)

print("Done, fitting on texts.")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, shuffle = True)

model = Sequential()
#Creates the wordembeddings.
embedding_vector_dim = 32
model.add(Embedding(10000, embedding_vector_dim, input_length=X.shape[1]))
model.add(Dropout(0.2))
model.add(LSTM(128))
model.add(Dropout(0.2))         
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
model.summary()


model.fit(numpy.array(X_train), numpy.array(y_train),
          batch_size=128,
          epochs=1,
          validation_data=(numpy.array(X_test), numpy.array(y_test)))
score, acc = model.evaluate(numpy.array(X_test),numpy.array(y_test),
                            batch_size=128)

model.save('./sentiment_seq.h5')

print('Test score:', score)
print('Test accuracy:', acc)

现在尝试预测和错误消息。

text = "this is actually a very bad movie."
tokenizer = Tokenizer()

tokenizer.fit_on_texts(list(text))
X = tokenizer.texts_to_sequences(list(text))
X = pad_sequences(X)
X_flat = np.array([X.flatten()])


model = load_model('sentiment_test.h5')
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
print(model.predict(X, batch_size = 1, verbose = 1))

ValueError: Error when checking : expected embedding_1_input to have shape (None, 116) but got array with shape (1, 38)

所以基本上为什么我得到这个错误，当训练和预测时预处理是相同的，我怎么能在看到错误信息之前知道预期输入应该是什么？

Answer 1

如果您没有使用固定的输入长度，则不应在嵌入层中定义input_length。

尝试使用Keras的model.predict（）时尺寸错误

1 个答案: