Question

当试图交叉验证k最近邻模型的数据时，我试图解释得到的结果。我的数据集设置为

variable1（int）| variable2（int）| variable3（int）| variable4（int）|响应（因子）

选择模型后，我将数据分别分为80％和20％进行测试。

我的代码的一次迭代如下：

(select 
(case when choice1 = True then 1 else 0 end) +
(case when choice2 = True then 1 else 0 end) +
(case when choice3 = True then 1 else 0 end) +
(case when choice4 = True then 1 else 0 end) +
(case when choice5 = True then 1 else 0 end) + as choice_sum
from Preferences)

当我运行'cv'时，它只返回一个list（），其中包含一些看似随机的数字作为行名，观察到的结果变量（y）和预测的结果变量（yhat）。我正在尝试计算测试集的某种精度。我应该将y与yhat进行比较以进行验证吗？

编辑：输出添加在下面

    cv <- cv.kknn(formula = Response~., cvdata, kcv = 10, k = 7, kernel = 'optimal', scale = TRUE)
    cv

Answer 1

[[2]]中的第一个元素是平均绝对误差，第二个是均方误差。假设df是您的数据帧，则可以通过mean（abs（df $ y-df $ yhat））和mean（（df $ y-df $ yhat）^ 2）轻松测试这些值。

如何解释cv.kknn（kknn包）中的交叉验证输出

1 个答案: