我正在训练一个使用keras进行情感分析的模型。这是我的模型:
max_fatures = 2000
tokenizer = Tokenizer(num_words=max_fatures, split=' ')
tokenizer.fit_on_texts(data)
X = tokenizer.texts_to_sequences(data)
X = pad_sequences(X)
with open('tokenizer.pkl', 'wb') as fid:
_pickle.dump(tokenizer, fid)
le = LabelEncoder()
le.fit(["pos", "neu", "neg"])
y = le.transform(data_labels)
y = keras.utils.to_categorical(y)
embed_dim = 128
lstm_out = 196
model = Sequential()
model.add(Embedding(max_fatures, embed_dim, input_length=X.shape[1]))
model.add(SpatialDropout1D(0.4))
model.add(LSTM(lstm_out, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
batch_size = 32
model.fit(X, y, epochs=10, batch_size=batch_size, verbose=2)
model.save('deep.h5')
但是当我加载此模型并尝试预测推文标签时,出现以下错误:
> Traceback (most recent call last): File
> "C:/Projects/Sentiment/test_deep.py", line 20, in <module>
> sentiment = model.predict(tweet, batch_size=1, verbose=2)[0] File "C:\Python\Python36\lib\site-packages\keras\engine\training.py",
> line 1149, in predict
> x, _, _ = self._standardize_user_data(x) File "C:\Python\Python36\lib\site-packages\keras\engine\training.py", line
> 751, in _standardize_user_data
> exception_prefix='input') File "C:\Python\Python36\lib\site-packages\keras\engine\training_utils.py",
> line 138, in standardize_input_data
> str(data_shape)) ValueError: Error when checking input: expected embedding_1_input to have shape (38,) but got array with shape (3,)
这是我的预测代码:
with open('tokenizer.pkl', 'rb') as handle:
tokenizer = _pickle.load(handle)
model = load_model('deep.h5')
tweet = ['امروز خیلی خوشحالم چون تیم مورد علاقه ام قهرمان شده']
tweet = tokenizer.texts_to_sequences(tweet)
tweet = pad_sequences(tweet)
le = LabelEncoder()
le.fit(["pos", "neu", "neg"])
sentiment = model.predict(tweet, batch_size=1, verbose=2)[0]
print(le.inverse_transform(np.argmax(sentiment)))
我很困惑,因为我正在逐步使用this教程。我究竟做错了什么?