r-error in knn:NA / NaN / Inf在外部函数调用中(arg 6)

时间:2016-04-04 15:31:40

标签: python r twitter

我正在通过应用SVM,NB和kNN来分析推文,以了解推文是正面,负面还是中性,为此我有80704条推文但是出于测试目的我只分析了2847条推文,它具有以下功能< / p>

> str(total.tweets.score)
'data.frame':   2847 obs. of  3 variables:
 $ score         : int  0 1 1 -2 0 0 1 2 -2 0 ...
 $ text          : Factor w/ 1790 levels "  st century is the era of knowledge and information which will change the way countries develop says",..: 1717 129 996 1072 682 795 524 132 143 773 ...
 $ Negative      : Factor w/ 2 levels "FALSE","TRUE": 1 1 1 2 1 1 1 1 2 1 ...

问题是当我将数据划分为训练和测试数据时,它适用于SVM和NB,但是当我应用kNN时给出错误,这是我如何分割数据,

total.tweets.score.train <- total.tweets.score[1:1993,]
total.tweets.score.test  <- total.tweets.score[1994:2847, ]

SVM模型:

model.SVm = svm(total.tweets.score.train$Negative~., data = total.tweets.score.train, kernel = "linear", epsilon = 0.1, probability = TRUE, type = "C")

NB型号:

nb.classifier <- naiveBayes(twitter.train , total.tweets.score.train$Negative)

此处twitter.train是文档字词矩阵。

kNN模型:

model.knn <- knn(twitter.train, twitter.test , knn.train.data.target , k = 3, prob = TRUE)

此处twitter.traintwitter.test都是文档字词矩阵,而kn.train.data.target是因素

当我运行kNN代码时,我收到以下错误,

Error in knn(twitter.train, twitter.test, knn.train.data.target, k = 3,  : 
  NA/NaN/Inf in foreign function call (arg 6)
In addition: Warning messages:
1: In knn(twitter.train, twitter.test, knn.train.data.target, k = 3,  :
  NAs introduced by coercion
2: In knn(twitter.train, twitter.test, knn.train.data.target, k = 3,  :
  NAs introduced by coercion

请帮我做什么?

1 个答案:

答案 0 :(得分:0)

这是我DTM的结构

> str(total.tweets.dtm)
List of 6
 $ i       : int [1:22041] 1 1 1 1 1 1 1 1 1 1 ...
 $ j       : int [1:22041] 138 163 617 1417 1852 1899 2534 2727 2792 3234 ...
 $ v       : num [1:22041] 1 1 1 1 1 1 1 1 1 1 ...
 $ nrow    : int 2847
 $ ncol    : int 4232
 $ dimnames:List of 2
  ..$ Docs : chr [1:2847] "character(0)" "character(0)" "character(0)" "character(0)" ...
  ..$ Terms: chr [1:4232] "#cpec" "aabpara" "aaj" "aakhir" ...
 - attr(*, "class")= chr [1:2] "DocumentTermMatrix" "simple_triplet_matrix"
 - attr(*, "weighting")= chr [1:2] "term frequency" "tf"