朴素贝叶斯分类器不适用于情绪分析

时间:2020-01-09 00:58:31

标签: python pandas scikit-learn naivebayes

我正在尝试训练朴素贝叶斯分类器来预测电影评论的好坏。 我正在按照本教程进行操作,但是在尝试训练模型时遇到错误:

https://medium.com/@MarynaL/analyzing-movie-review-data-with-natural-language-processing-7c5cba6ed922

我一直遵循所有步骤,直到训练模型为止。我的数据和代码看起来像这样:

                                                 Reviews  Labels
0      For fans of Chris Farley, this is probably his...       1
1      Fantastic, Madonna at her finest, the film is ...       1
2      From a perspective that it is possible to make...       1
3      What is often neglected about Harold Lloyd is ...       1
4      You'll either love or hate movies such as this...       1
                                              ...     ...
14995  This is perhaps the worst movie I have ever se...       0
14996  I was so looking forward to seeing this film t...       0
14997  It pains me to see an awesome movie turn into ...       0
14998  "Grande Ecole" is not an artful exploration of...       0
14999  I felt like I was watching an example of how n...       0

gnb = MultinomialNB()
gnb.fit(all_train_set['Reviews'], all_train_set['Labels'])

但是,当尝试拟合模型时,出现此错误:

ValueError: could not convert string to float: 'For fans of Chris Farley, this is probably his best film. David Spade pl

如果有人可以帮助我确定为什么本教程出错了,将不胜感激。

非常感谢

1 个答案:

答案 0 :(得分:0)

实际上,在使用Scikit-learn之前,您必须先将文本转换为数字,然后再调用分类器。您可以通过使用CountVectorizerTfidfVectorizer来做到这一点。

如果您想使用更现代的单词嵌入,可以使用Zeugma软件包(将其与pip install zeugma一起安装在终端中),例如

from zeugma.embeddings import EmbeddingTransformer

embedding = EmbeddingTransformer('glove')

X = embedding.transform(all_train_set['Reviews'])
y = all_train_set['Labels']

gnb = MultinomialNB()
gnb.fit(X, y)

希望对您有帮助!