我正在运行一个决策树模型,我使用 K 折交叉验证将训练集随机分成 K 个不同的子集 (K = 10)
如何计算平均训练误差和平均交叉验证误差?
这是我目前的代码,我是机器学习的新手。
data_tree_copy <- data_tree[sample(nrow(data_tree)),]
## Create 10 equally size folds
k_folds <- cut(seq(1,nrow(data_tree_copy)),
breaks = 10,
labels = FALSE)
MSE_tree <- 0
for(i in 1:10){
testIndexes <- which(k_folds == i, arr.ind=TRUE)
testData <- data_tree_copy[testIndexes, ]
trainData <- data_tree_copy[-testIndexes, ]
## median_value is my predictor.
tree_model <- rpart(median_value~.,
data = trainData,
method = "anova",
control = rpart.control(minsplit = 10,
cp = 0.001))
predict2 <- predict(tree_model, testData)
MSE_tree <- MSE_tree + sum(k_folds == i)/nrow(data_tree_copy) * mean((predict2 - testData$median_value)^2)
}
RSME_pred <- sqrt(MSE_tree)
print(paste("root mean-squared error of the prediction is", RSME_pred))
train_error <- mean((predict2)^2)
train_error
printcp(tree_model) # Cross-validation check
我在这里做错了什么!我得到了结果,但无法掌握。
提前致谢!!!