对于R中顺序调整的回归的循环

时间:2017-06-16 13:44:25

标签: r for-loop logistic-regression

与R中的传统循环相关的大多数问题都是通过使用代码较少的函数来解释的,并且通常更灵活。

但是,请纠正我,我觉得当迭代的顺序很重要时,for循环仍然会占主导地位。

在我的情况下,我想建立一个顺序和累积调整后的逻辑回归模型,存储OR / CI以及显示正在调整的内容的列。这是我的预期输出:

 Model        OR     CI

 Biomarker
 +Age
 +Sex
 +Smoking 

这就是我的所作所为:

df1 <- subset(df, select = c(age_cat, is_female, smoking_category,
                                 bmi_calc, has_diabetes, sbp_mean, 
                                 alcohol_category, highest_education,
                                 occupation, household_income))
model <- data.frame(NULL)

for (i in seq_along(df1)) {

  model <- exp((cbind(OR = coef(glm(as.formula(paste("istroke ~ log2(hscrp_mgl)", i, sep = "+")), 
                         family=binomial, data=df)),
           confint(glm(as.formula(paste("istroke ~ log2(hscrp_mgl)", i, sep = "+")), 
                       family=binomial, data=df)))))


}

我的结果变量是stroke(istroke,0或1)。我感兴趣的是生物标志物(hscrp_mgl)。我知道我在某个地方犯了一个根本性的错误。我在其他SO帖子中寻找,但大多数人都不想按顺序和累积调整回归模型。

如果这是重复的,请告诉我,但是如果有什么不清楚的话。

修改

我的原始数据集df包含df1的所有变量,我的结果变量,然后是一些变量。这是一个可重复的样本:

age_cat is_female   smoking_category    bmi_calc    has_diabetes        sbp_mean    istroke
(59,69]        0           4            19.6           0                103.5          0
(59,69]        1           1            19.1           0                 138           0
(29,59]        0           4            26.8           0               155.5           0
(29,59]        0           1            23.1           0                 130           1
(29,59]        1           1            22.7           0                 126           1
(59,69]        0           4             25            0               182.5           0
(29,59]        1           1             20            0                  96           1
(29,59]        1           2             23.9          0               134.5           0
(59,69]        0           4             24.4          0               160.5           1

修改 一个更可重复的例子:

df <- data.frame(age = c(50, 60, 50, 40, 70, 90, 30),
             gender = c(0, 1, 1, 0, 1, 1, 1),
             smoke = c(4, 3, 2, 1, 4, 3, 4),
             BMI = c(19, 20, 21, 22, 23, 24, 25),
             SBP = c(100, 120, 140, 110, 120, 130, 120),
             diab = c(0, 1, 1, 1, 0, 1, 1),
             stroke = c(0, 1, 0, 0, 1, 1, 1))
dput(df)
structure(list(age = c(50, 60, 50, 40, 70, 90, 30), gender = c(0, 
1, 1, 0, 1, 1, 1), smoke = c(4, 3, 2, 1, 4, 3, 4), BMI = c(19, 
20, 21, 22, 23, 24, 25), SBP = c(100, 120, 140, 110, 120, 130, 
120), diab = c(0, 1, 1, 1, 0, 1, 1), stroke = c(0, 1, 0, 0, 1, 
1, 1)), .Names = c("age", "gender", "smoke", "BMI", "SBP", "diab", 
"stroke"), row.names = c(NA, -7L), class = "data.frame")

2 个答案:

答案 0 :(得分:0)

我没有hscrp_mgl的数据框来重现结果并确保它与您想要的相同,但您可以尝试以下方法:

获取您希望在迭代中使用的所有功能的名称:

x <-  setdiff(names(df), "stroke")

使用purrr::map

创建一个包含功能名称的第一列的数据框,并使用purrr::map来改变您想要的值。

library(purrr)

model <- data_frame(Model = x) %>% 
  mutate(OR = map(Model, ~coef(glm(as.formula(paste("stroke ~ log2(hscrp_mgl)", .x, sep = "+")), 
                                   family=binomial, data=df))),
         CI = map(Model, ~confint(glm(as.formula(paste("stroke ~ log2(hscrp_mgl)", .x, sep = "+")), 
                                   family=binomial, data=df)))

你会得到这样的话:

# A tibble: 6 × 3
   Model        OR            CI
   <chr>    <list>        <list>
1    age <dbl [3]> <dbl [3 × 2]>
2 gender <dbl [3]> <dbl [3 × 2]>
3  smoke <dbl [3]> <dbl [3 × 2]>
4    BMI <dbl [3]> <dbl [3 × 2]>
5    SBP <dbl [3]> <dbl [3 × 2]>
6   diab <dbl [3]> <dbl [3 × 2]>

使用Purrr::mapbroom

您还可以使用broom函数从模型中提取所需的数据,如下所示:

  • 将模型结果添加到一列
  • 使用tidy获取coef并变异并添加OR
  • 得到conf。使用confint_tidy并添加CI
  • 的时间间隔
model2 <- data_frame(Model = x) %>% 
  mutate(model_details = map(Model, ~glm(as.formula(paste("stroke ~ log2(hscrp_mgl)", .x, sep = "+")), 
                                   family=binomial, data=df))) %>% 
  mutate(OR = map(model_details, broom::tidy),
         CI = map(model_details, broom::confint_tidy))

累积调整

对于累积调整,您可以尝试以下操作:

model <- data_frame(Model = cnames) %>% 
  mutate(Model_adjust = map2_chr(Model, seq_along(Model), ~paste(cnames[1:.y], collapse = "+"))) %>% 
  mutate(model_details = map(Model_adjust, ~glm(as.formula(paste("stroke ~ log2(hscrp_mgl)", .x, sep = "+")), 
                                         family=binomial, data=df))) %>% 
  mutate(OR = map(model_details, broom::tidy),
         CI = map(model_details, broom::confint_tidy))

附加步骤添加了包含变量的列,然后以下步骤使用Model_adjust来拟合模型:

model <- data_frame(Model = cnames) %>% 
    mutate(Model_adjust = map2_chr(Model, seq_along(Model), ~paste(cnames[1:.y], collapse = "+")))

    # A tibble: 6 × 2
       Model                  Model_adjust
       <chr>                         <chr>
    1    age                           age
    2 gender                    age+gender
    3  smoke              age+gender+smoke
    4    BMI          age+gender+smoke+BMI
    5    SBP      age+gender+smoke+BMI+SBP
    6   diab age+gender+smoke+BMI+SBP+diab

答案 1 :(得分:0)

实际上,lapply可能是for更好的方法,因为它可以返回最终行绑定的data.frames集合,而不是迭代地扩展模型

下面的示例随机化 hscrp_mgl ,因为它不在发布的数据中。所以忽略结果但考虑过程。另外,置信区间在不同列中的低和高之间分配。

set.seed(456)
df <- data.frame(hscrp_mgl = abs(rnorm(250)),
                 age = sample(100, 1000, replace=TRUE),
                 gender = sample(0:1, 1000, replace=TRUE),
                 smoke = sample(1:4, 1000, replace=TRUE),
                 BMI = sample(19:25, 1000, replace=TRUE),
                 SBP = sample(c(100, 120, 140, 110, 120, 130, 120),
                              1000, replace=TRUE),
                 diab = sample(0:1, 1000, replace=TRUE),
                 stroke = sample(0:1, 1000, replace=TRUE))  

# ITERATE THROUGH COLUMN NUMBERS (SUBSETTING OUT FIRST AND LAST)
modeldfs <- lapply(seq_along(df)[3:ncol(df)-1], function(i) {
  strf <- paste("stroke ~ log2(hscrp_mgl)", 
                paste(names(df)[2:i], collapse = "+"), sep = "+")
  print(strf)

  # FIT DYNAMIC CUMULATIVE FORMULA USING names() TO PASS IN COLUMN NAME
  fit <- glm(as.formula(strf), family=binomial, data=df)

  # BIND MODEL STATS
  data.frame(OR = exp(coef(fit)[i+1]), 
             CI_2.5 = exp(confint(fit)[i+1,1]), 
             CI_97.5 = exp(confint(fit)[i+1,2]))
})

model <- do.call(rbind, modeldfs)
model

输出

[1] "stroke ~ log2(hscrp_mgl)+age"
# Waiting for profiling to be done...
# Waiting for profiling to be done...
[1] "stroke ~ log2(hscrp_mgl)+age+gender"
# Waiting for profiling to be done...
# Waiting for profiling to be done...
[1] "stroke ~ log2(hscrp_mgl)+age+gender+smoke"
# Waiting for profiling to be done...
# Waiting for profiling to be done...
[1] "stroke ~ log2(hscrp_mgl)+age+gender+smoke+BMI"
# Waiting for profiling to be done...
# Waiting for profiling to be done...
[1] "stroke ~ log2(hscrp_mgl)+age+gender+smoke+BMI+SBP"
# Waiting for profiling to be done...
# Waiting for profiling to be done...
[1] "stroke ~ log2(hscrp_mgl)+age+gender+smoke+BMI+SBP+diab"
# Waiting for profiling to be done...
# Waiting for profiling to be done...
# > model <- do.call(rbind, modeldfs)
# > model
             OR    CI_2.5  CI_97.5
age    1.003285 0.9989043 1.007701
gender 1.067117 0.8318796 1.369055
smoke  1.005926 0.9005196 1.123717
BMI    1.011281 0.9505659 1.075928
SBP    1.003252 0.9929368 1.013692
diab   1.139586 0.8880643 1.462925