使用R中的for循环运行多个GLM模型

时间:2016-12-20 20:04:11

标签: r

我试图做一些实验,我想在R中使用相同的变量但不同的训练样本运行几个GLM模型。

以下是一些模拟数据:

resp <- sample(0:1,100,TRUE)
x1 <- c(rep(5,20),rep(0,15), rep(2.5,40),rep(17,25))
x2 <- c(rep(23,10),rep(5,10), rep(15,40),rep(1,25), rep(2, 15))
dat <- data.frame(resp,x1, x2)

这是我尝试使用的循环:

n <- 5
for (i in 1:n)
{
  ### Create training and testing data
  ## 80% of the sample size
  # Note that I didn't use seed so that random split is performed every iteration.
  smp_sizelogis <- floor(0.8 * nrow(dat))

  train_indlogis <- sample(seq_len(nrow(dat)), size = smp_sizelogis)

  trainlogis <- dat[train_indlogis, ]
  testlogis  <- dat[-train_indlogis, ]

  InitLOogModel[i] <- glm(resp ~ ., data =trainlogis, family=binomial)
}

但不幸的是,我收到了这个错误:

Error in InitLOogModel[i] <- glm(resp ~ ., data = trainlogis, family = binomial) : 
  object 'InitLOogModel' not found

任何想法。

1 个答案:

答案 0 :(得分:1)

我建议您使用caret来完成您的工作。这需要一些时间来学习,但结合了许多“最佳实践”。一旦学会了基础知识,您就可以快速尝试glm以外的模型,并轻松地将模型相互比较。以下是您示例中的修改代码,以帮助您入门。

## caret
library(caret)

# your data
resp <- sample(0:1,100,TRUE)
x1 <- c(rep(5,20),rep(0,15), rep(2.5,40),rep(17,25))  
x2 <- c(rep(23,10),rep(5,10), rep(15,40),rep(1,25), rep(2, 15))
dat <- data.frame(resp,x1, x2)

# so caret knows you're trying to do classification, otherwise will give you an error at the train step
dat$resp <- as.factor(dat$resp)

# create a hold-out set to use after your model fitting
# not really necessary for your example, but showing for completeness
train_index <- createDataPartition(dat$resp, p = 0.8,
                                   list = FALSE,
                                   times = 1)

# create your train and test data
train_dat <- dat[train_index, ]
test_dat <- dat[-train_index, ]

# repeated cross validation, repeated 5 times
# this is like your 5 loops, taking 80% of the data each time
fitControl <- trainControl(method = "repeatedcv",
                           number = 5,
                           repeats = 5)

# fit the glm!
glm_fit <- train(resp ~ ., data = train_dat,
                 method = "glm",
                 family = "binomial",
                 trControl = fitControl)

# summary
glm_fit

# best model
glm_fit$finalModel