我在机器学习方面很陌生并正在尝试forest cover prediction competition on Kaggle,但我很早就被挂了。运行下面的代码时出现以下错误。
Error in train.default(x, y, weights = w, ...) : final tuning parameters could not be determined In addition: There were 50 or more warnings (use warnings() to see the first 50)
# Load the libraries
library(ggplot2); library(caret); library(AppliedPredictiveModeling)
library(pROC)
library(Amelia)
set.seed(1234)
# Load the forest cover dataset from the csv file
rawdata <- read.csv("train.csv",stringsAsFactors = F)
#this data won't be used in model evaluation. It will only be used for the submission.
test <- read.csv("test.csv",stringsAsFactors = F)
########################
### DATA PREPARATION ###
########################
#create a training and test set for building and evaluating the model
samples <- createDataPartition(rawdata$Cover_Type, p = 0.5,list = FALSE)
data.train <- rawdata[samples, ]
data.test <- rawdata[-samples, ]
model1 <- train(as.factor(Cover_Type) ~ Elevation + Aspect + Slope + Horizontal_Distance_To_Hydrology,
data = data.train,
method = "rf", prox = "TRUE")
答案 0 :(得分:8)
以下内容应该有效:
model1 <- train(as.factor(Cover_Type) ~ Elevation + Aspect + Slope + Horizontal_Distance_To_Hydrology,
data = data.train,
method = "rf", tuneGrid = data.frame(mtry = 3))
最好指定tuneGrid
参数,该参数是具有可能调整值的数据框。有关详细信息,请查看?randomForest
和?train
。 rf
只有一个调整参数mtry
,它控制为每棵树选择的要素数。
您还可以运行modelLookup
以获取每个模型的调整参数列表
> modelLookup("rf")
# model parameter label forReg forClass probModel
#1 rf mtry #Randomly Selected Predictors TRUE TRUE TRUE
答案 1 :(得分:4)
我也正在进行Kaggle比赛,并且一直在使用'caret'包来帮助选择'最佳'模型参数。在得到许多这些错误之后,我查看了幕后的脚本,并发现了对一个名为“class2ind”的函数的调用,该函数不存在(至少在我知道的任何地方)。我终于在'nnet'包中找到了另一个名为'class.ind'的函数。我决定尝试创建一个名为'class2ind'的本地函数,然后在'class.ind'函数的代码中弹出。低,看它有效!
# fix for caret
class2ind <- function(cl)
{
n <- length(cl)
cl <- as.factor(cl)
x <- matrix(0, n, length(levels(cl)) )
x[(1:n) + n*(unclass(cl)-1)] <- 1
dimnames(x) <- list(names(cl), levels(cl))
x
}