R中的K折交叉验证:如何获得百分比或整数误差

时间:2018-11-12 22:48:16

标签: r cross-validation

我正在对3种不同的算法执行10倍交叉验证:SVM,神经网络,rpart。我正在尝试选择最准确的算法

rpart错误是:

[1] 0.9705882353 1.0000000000 1.0000000000 0.9411764706 1.0000000000 1.0000000000 0.8823529412 1.0000000000
 [9] 0.9705882353 0.9705882353

神经错误是:

[1] 157.52352368  80.07471671  95.11278873  78.70281592 100.58281184  79.61699438  90.91953877 120.54595936
 [9] 143.57563143  41.78655472

svm错误为:

[1]  60.89085317  86.23601068 115.46072775  75.03890373  70.18948759 102.37164174 117.48471252  61.60451089
[9]  89.44647999

我使用的代码: 对于神经网络代码:

for(i in 1:k){
    #index = sample(seq_len (nrow(leaf)), size = samplesize)
    index <- sample(1:nrow(leaf),round(0.9*nrow(leaf)))
   # index = total_index[(i*(k-1)+1):(i*(k-1)+k)]
    train.cv <- scaled[index,]
    test.cv <- scaled[-index,]
    nn <- neuralnet(form, train.cv,hidden=c(5,2),linear.output=T, stepmax = 1e6)
    pr.nn <- compute(nn,test.cv[,2:16])
    pr.nn <- pr.nn$net.result*(max(leaf$Class)-min(leaf$Class))+min(leaf$Class)
    test.cv.r <- (test.cv$Class)*(max(leaf$Class)-min(leaf$Class))+min(leaf$Class)
    cv.error[i] <- sum((test.cv.r - pr.nn)^2)/nrow(test.cv)
    pbar$step()
  }

对于rpart:

set.seed(123)
   form = "Class ~ SpecimenNumber+Eccentricity+AspRatio+Elongation+Solidity+StoConvex+IsoFactor+MaxIndentDepth+Lobeedness+AvgIntensity+AvgContrast+Smoothness+ThirdMoment+Uniformity+Entropy"
   folds = split(scaled, cut(sample(1:nrow(scaled)), 10))
   errs = rep(NA, length(folds))
   for (i in 1:length(folds)) {
     test <- ldply(folds[i], data.frame)
     train <- ldply(folds[-i], data.frame)
    # train <- scaled[index,]
     #test <- scaled[-index,]
     tmp.model <- rpart(form , train, method = "class")
     tmp.predict <- predict(tmp.model, newdata = test, type = "class")
     conf.mat <- table(test$Class, tmp.predict)
     errs[i] <- 1-(sum(diag(conf.mat))/sum(conf.mat))
   }

对于svm:

tuned = tune.svm(Class~., data = traindata, gamma = 10^(-1:-3), cost =10^(1:3), tunecontrol = tune.control(cross = 10))

叶是实际数据,已缩放是使用max / mins缩放的叶数据

我需要比较这些错误,并选择最准确的算法。我不知道如何使用具有不同单位的算法来做到这一点。 rpart给我的错误是0到1之间,而其他两个给我的错误是整数。我无法弄清楚它们产生不同单位的算法是什么。如何获得交叉验证,以全面赋予我相同的指标?

0 个答案:

没有答案