我创建了一个文本分类器,可以将注释分类为各种类别,例如
Comment Category
Good Service provided Service
Excellent Communication Communication
我使用以下方法进行了分类:
knn(modeldata[train, ], modeldata[test,] , cl[train], k =2, use.all = TRUE)
现在我想使用K-Fold交叉验证来评估此模型。我期待一个数字,我可以用它来知道模型是否过度拟合或欠拟合等
我用过
knn.cv(modeldata[train, ], cl[train], k =2, use.all = TRUE)
但是这个命令的帮助说如果模型混淆,它将返回NA。请指导
答案 0 :(得分:1)
您使用哪种包装?您可以使用插入符号作为以下内容(使用iris数据集的示例):
training <- iris
ctrl <- trainControl(method="repeatedcv",repeats = 3)
knnFit <- train(Species ~ ., data = training, method = "knn",
trControl = ctrl, preProcess = c("center","scale"))
knnFit
输出
k-Nearest Neighbors
150 samples
4 predictor
3 classes: 'setosa', 'versicolor', 'virginica'
Pre-processing: centered (4), scaled (4)
Resampling: Cross-Validated (10 fold, repeated 3 times)
Summary of sample sizes: 135, 135, 135, 135, 135, 135, ...
Resampling results across tuning parameters:
k Accuracy Kappa
5 0.9511111 0.9266667
7 0.9577778 0.9366667
9 0.9533333 0.9300000
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was k = 7.