我正在尝试训练朴素贝叶斯分类器来预测电影评论的好坏。 我正在按照本教程进行操作,但是在尝试训练模型时遇到错误:
我一直遵循所有步骤,直到训练模型为止。我的数据和代码看起来像这样:
Reviews Labels
0 For fans of Chris Farley, this is probably his... 1
1 Fantastic, Madonna at her finest, the film is ... 1
2 From a perspective that it is possible to make... 1
3 What is often neglected about Harold Lloyd is ... 1
4 You'll either love or hate movies such as this... 1
... ...
14995 This is perhaps the worst movie I have ever se... 0
14996 I was so looking forward to seeing this film t... 0
14997 It pains me to see an awesome movie turn into ... 0
14998 "Grande Ecole" is not an artful exploration of... 0
14999 I felt like I was watching an example of how n... 0
gnb = MultinomialNB()
gnb.fit(all_train_set['Reviews'], all_train_set['Labels'])
但是,当尝试拟合模型时,出现此错误:
ValueError: could not convert string to float: 'For fans of Chris Farley, this is probably his best film. David Spade pl
如果有人可以帮助我确定为什么本教程出错了,将不胜感激。
非常感谢
答案 0 :(得分:0)
实际上,在使用Scikit-learn之前,您必须先将文本转换为数字,然后再调用分类器。您可以通过使用CountVectorizer
或TfidfVectorizer
来做到这一点。
如果您想使用更现代的单词嵌入,可以使用Zeugma软件包(将其与pip install zeugma
一起安装在终端中),例如
from zeugma.embeddings import EmbeddingTransformer
embedding = EmbeddingTransformer('glove')
X = embedding.transform(all_train_set['Reviews'])
y = all_train_set['Labels']
gnb = MultinomialNB()
gnb.fit(X, y)
希望对您有帮助!