我正在通过应用SVM,NB和kNN来分析推文,以了解推文是正面,负面还是中性,为此我有80704条推文但是出于测试目的我只分析了2847条推文,它具有以下功能< / p>
> str(total.tweets.score)
'data.frame': 2847 obs. of 3 variables:
$ score : int 0 1 1 -2 0 0 1 2 -2 0 ...
$ text : Factor w/ 1790 levels " st century is the era of knowledge and information which will change the way countries develop says",..: 1717 129 996 1072 682 795 524 132 143 773 ...
$ Negative : Factor w/ 2 levels "FALSE","TRUE": 1 1 1 2 1 1 1 1 2 1 ...
问题是当我将数据划分为训练和测试数据时,它适用于SVM和NB,但是当我应用kNN时给出错误,这是我如何分割数据,
total.tweets.score.train <- total.tweets.score[1:1993,]
total.tweets.score.test <- total.tweets.score[1994:2847, ]
SVM模型:
model.SVm = svm(total.tweets.score.train$Negative~., data = total.tweets.score.train, kernel = "linear", epsilon = 0.1, probability = TRUE, type = "C")
NB型号:
nb.classifier <- naiveBayes(twitter.train , total.tweets.score.train$Negative)
此处twitter.train
是文档字词矩阵。
kNN模型:
model.knn <- knn(twitter.train, twitter.test , knn.train.data.target , k = 3, prob = TRUE)
此处twitter.train
和twitter.test
都是文档字词矩阵,而kn.train.data.target
是因素
当我运行kNN代码时,我收到以下错误,
Error in knn(twitter.train, twitter.test, knn.train.data.target, k = 3, :
NA/NaN/Inf in foreign function call (arg 6)
In addition: Warning messages:
1: In knn(twitter.train, twitter.test, knn.train.data.target, k = 3, :
NAs introduced by coercion
2: In knn(twitter.train, twitter.test, knn.train.data.target, k = 3, :
NAs introduced by coercion
请帮我做什么?
答案 0 :(得分:0)
这是我DTM的结构
> str(total.tweets.dtm)
List of 6
$ i : int [1:22041] 1 1 1 1 1 1 1 1 1 1 ...
$ j : int [1:22041] 138 163 617 1417 1852 1899 2534 2727 2792 3234 ...
$ v : num [1:22041] 1 1 1 1 1 1 1 1 1 1 ...
$ nrow : int 2847
$ ncol : int 4232
$ dimnames:List of 2
..$ Docs : chr [1:2847] "character(0)" "character(0)" "character(0)" "character(0)" ...
..$ Terms: chr [1:4232] "#cpec" "aabpara" "aaj" "aakhir" ...
- attr(*, "class")= chr [1:2] "DocumentTermMatrix" "simple_triplet_matrix"
- attr(*, "weighting")= chr [1:2] "term frequency" "tf"