更新

Question

更新

我在使用朴素贝叶树设置文本分类时遇到问题。首先，我有3个文本文件，两个带有好/坏字的模板，一个测试文件。根据我以前的评级模板，我的TermDocumentMatrix已创建，我也有一个评级向量：

TDM   word1   word2   word3   word4 ...  rating
doc1    1       1       1                 good
doc2            1        1      1          bad
doc3 ...

该向量未添加到TDM，因为我认为cbind会将值转换为character。所以我将矩阵分成两部分：

template_train <- complete_TDM[1:(x+y),]
text_test <- data.matrix(complete_TDM[((x+y+1):nrow(complete_TDM)),])

其中x是好评级模板的行数，y是坏评级模板的行数。

random <- sample(x+y)
template_train <- data.matrix(template_train[random,])   ###shuffle 
rating_vector <- as.factor(rating[random]) ###vector containing rating, shuffled the same way

然后我创建了一个naiveBayes模型：

naive_model <- naiveBayes(rating_vector~., x = template_train, y=rating_vector)

想要预测

prediction <- predict(naive_model, text_test)

但是在最后一步，我收到一个错误：

> prediction <- predict(naive_model, text_test)
Error in log(sapply(seq_along(attribs), function(v) { : 
  non-numeric argument to mathematical function

提前致谢！

更新

好的，我刚解决了这个问题，我现在使用data.matrix代替as.matrix和as.factor作为我的评分向量，但现在我遇到了问题，一切都很好被评为差评反之亦然。

> table(prediction, rating_vector)
          rating_vector
prediction bad good
      bad    0   95
      good  94    0

Answer 1

您可以使用

text_test = data.frame(text_test)
prediction <- predict(naive_model, text_test)

天真贝叶斯分类错误＆＃39;数学函数的非数字参数＆＃39;

更新

更新

1 个答案: