Question

我正在尝试使用深度学习算法进行情感分析。我已经建立了自己的模型并使用测试数据进行了测试。我还应该使用新输入进行预测。但是，我不知道我该怎么做？

这是我的代码：

# Tokenization
print(colored("Tokenizing and padding data", "yellow"))
tokenizer = Tokenizer(num_words=2000, split=' ')
tokenizer.fit_on_texts(train_data['Clean_tweet'].astype(str).values)
train_tweets = tokenizer.texts_to_sequences(train_data['Clean_tweet'].astype(str).values)
max_len = max([len(i) for i in train_tweets])
vocab_size = len(tokenizer.word_index) + 1
train_tweets = pad_sequences(train_tweets, maxlen=max_len)

le = LabelEncoder()
y = le.fit_transform(train_data['Sentiment'].values)

test_tweets = tokenizer.texts_to_sequences(test_data['Clean_tweet'].astype(str).values)
test_tweets = pad_sequences(test_tweets, maxlen=max_len)
print(colored("Tokenizing and padding complete", "yellow"))

embeddings_dictionary = dict()
embedding_dim = 100
glove_file = open('glove.6B.100d.txt')

for line in glove_file:
    records = line.split()
    word = records[0]
    vector_dimensions = np.asarray(records[1:], dtype='float32')
    embeddings_dictionary[word] = vector_dimensions

glove_file.close()

embeddings_matrix = np.zeros((vocab_size, embedding_dim))
for word, index in tokenizer.word_index.items():
    embedding_vector = embeddings_dictionary.get(word)
    if embedding_vector is not None:
        embeddings_matrix[index] = embedding_vector

embedding_layer = Embedding(vocab_size, embedding_dim, input_length=max_len, weights=[embeddings_matrix],
                            trainable=False)

# Building the model
print(colored("Creating the LSTM model", "yellow"))
model = Sequential()
model.add(embedding_layer)
model.add(Dropout(0.4))
model.add(Dense(64, activation='relu'))
model.add(LSTM(256, dropout=0.2))
model.add(Dropout(0.3))
model.add(Dense(2, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()


# Training the model
print(colored("Training the LSTM model", "green"))
history = model.fit(train_tweets, pd.get_dummies(train_data['Sentiment']).values, epochs=15, batch_size=256,
                    validation_split=0.2)
print(colored(history, "green"))

model.save('model.h5')

我做了这样的事情来预测新输入：

text = "I hate you"

seq = tokenizer.texts_to_sequences(text)
padded = pad_sequences(seq, maxlen=len(text) + 1)
pred = model.predict_classes(padded)
print(pred)

输出为：

[1 1 1 1 1 1 1 1 1 1]

我如何清楚地了解什么是情绪（0 表示负面，1 表示正面）？

如何使用新输入对深度学习模型进行预测？

0 个答案: