我试图做一些实验,我想在R中使用相同的变量但不同的训练样本运行几个GLM模型。
以下是一些模拟数据:
resp <- sample(0:1,100,TRUE)
x1 <- c(rep(5,20),rep(0,15), rep(2.5,40),rep(17,25))
x2 <- c(rep(23,10),rep(5,10), rep(15,40),rep(1,25), rep(2, 15))
dat <- data.frame(resp,x1, x2)
这是我尝试使用的循环:
n <- 5
for (i in 1:n)
{
### Create training and testing data
## 80% of the sample size
# Note that I didn't use seed so that random split is performed every iteration.
smp_sizelogis <- floor(0.8 * nrow(dat))
train_indlogis <- sample(seq_len(nrow(dat)), size = smp_sizelogis)
trainlogis <- dat[train_indlogis, ]
testlogis <- dat[-train_indlogis, ]
InitLOogModel[i] <- glm(resp ~ ., data =trainlogis, family=binomial)
}
但不幸的是,我收到了这个错误:
Error in InitLOogModel[i] <- glm(resp ~ ., data = trainlogis, family = binomial) :
object 'InitLOogModel' not found
任何想法。
答案 0 :(得分:1)
我建议您使用caret来完成您的工作。这需要一些时间来学习,但结合了许多“最佳实践”。一旦学会了基础知识,您就可以快速尝试glm
以外的模型,并轻松地将模型相互比较。以下是您示例中的修改代码,以帮助您入门。
## caret
library(caret)
# your data
resp <- sample(0:1,100,TRUE)
x1 <- c(rep(5,20),rep(0,15), rep(2.5,40),rep(17,25))
x2 <- c(rep(23,10),rep(5,10), rep(15,40),rep(1,25), rep(2, 15))
dat <- data.frame(resp,x1, x2)
# so caret knows you're trying to do classification, otherwise will give you an error at the train step
dat$resp <- as.factor(dat$resp)
# create a hold-out set to use after your model fitting
# not really necessary for your example, but showing for completeness
train_index <- createDataPartition(dat$resp, p = 0.8,
list = FALSE,
times = 1)
# create your train and test data
train_dat <- dat[train_index, ]
test_dat <- dat[-train_index, ]
# repeated cross validation, repeated 5 times
# this is like your 5 loops, taking 80% of the data each time
fitControl <- trainControl(method = "repeatedcv",
number = 5,
repeats = 5)
# fit the glm!
glm_fit <- train(resp ~ ., data = train_dat,
method = "glm",
family = "binomial",
trControl = fitControl)
# summary
glm_fit
# best model
glm_fit$finalModel