我正在喀拉拉邦写一篇LSTM,可以检测评论的毒性。我在X
上训练模型。我在X上所做的操作
1。
max_features = 2000
tokenizer = Tokenizer(num_words=max_features)
tokenizer.fit_on_texts(data.values)
2。
dictionary = tokenizer.word_index
3。
with open('wordindex.json', 'w') as dictionary_file:
json.dump(dictionary , dictionary_file)
4。
X = tokenizer.texts_to_sequences(data.values)
X = pad_sequences(X)
最终X
的形状为(1396,)
。
我在这些课程上训练了模型
list_classes = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]
y = train[list_classes].values
模型经过训练后,我尝试在输入中对其进行测试。我为转换输入pred = 'f you'
而执行的步骤。
dictionary = json.load(open('wordindex.json'))
def convert_text_to_index_array(text):
# one really important thing that `text_to_word_sequence` does
# is make all texts the same length -- in this case, the length
# of the longest text in the set.
wordvec=[]
for word in kpt.text_to_word_sequence(text) :
if word in dictionary:
wordvec.append([dictionary[word]])
else:
wordvec.append([0])
return wordvec
pred=convert_text_to_index_array(pred)
当我print(model.predict(pred))
时。我得到一个输出ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), but instead got the following list of 2 arrays: [array([[129]]), array([[6]])]...
当我尝试查看掠食者的形状时,出现错误:
print(pred.shape)
AttributeError: 'list' object has no attribute 'shape'
我真的很沮丧,因为看起来我的pred
是一维的。 array([[129]]),array([[6]])]是我在json文件中编码的单词,但是为什么它们在单独的数组中?