R中KNN文本分类器的KFold交叉验证

时间:2016-10-17 06:56:33

标签: r validation knn text-classification

我创建了一个文本分类器,可以将注释分类为各种类别,例如

      Comment                          Category 
Good Service provided                   Service
Excellent Communication                 Communication

我使用以下方法进行了分类:

 knn(modeldata[train, ], modeldata[test,] , cl[train], k =2, use.all = TRUE)

现在我想使用K-Fold交叉验证来评估此模型。我期待一个数字,我可以用它来知道模型是否过度拟合或欠拟合等

我用过

knn.cv(modeldata[train, ], cl[train], k =2, use.all = TRUE)

但是这个命令的帮助说如果模型混淆,它将返回NA。请指导

1 个答案:

答案 0 :(得分:1)

您使用哪种包装?您可以使用插入符号作为以下内容(使用iris数据集的示例):

training <- iris
ctrl <- trainControl(method="repeatedcv",repeats = 3)  
knnFit <- train(Species ~ ., data = training, method = "knn", 
                trControl = ctrl, preProcess = c("center","scale"))
knnFit

输出

k-Nearest Neighbors 

150 samples
  4 predictor
  3 classes: 'setosa', 'versicolor', 'virginica' 

Pre-processing: centered (4), scaled (4) 
Resampling: Cross-Validated (10 fold, repeated 3 times) 
Summary of sample sizes: 135, 135, 135, 135, 135, 135, ... 
Resampling results across tuning parameters:

  k  Accuracy   Kappa    
  5  0.9511111  0.9266667
  7  0.9577778  0.9366667
  9  0.9533333  0.9300000

Accuracy was used to select the optimal model using  the largest value.
The final value used for the model was k = 7.