与R中的传统循环相关的大多数问题都是通过使用代码较少的函数来解释的,并且通常更灵活。
但是,请纠正我,我觉得当迭代的顺序很重要时,for循环仍然会占主导地位。
在我的情况下,我想建立一个顺序和累积调整后的逻辑回归模型,存储OR / CI以及显示正在调整的内容的列。这是我的预期输出:
Model OR CI
Biomarker
+Age
+Sex
+Smoking
这就是我的所作所为:
df1 <- subset(df, select = c(age_cat, is_female, smoking_category,
bmi_calc, has_diabetes, sbp_mean,
alcohol_category, highest_education,
occupation, household_income))
model <- data.frame(NULL)
for (i in seq_along(df1)) {
model <- exp((cbind(OR = coef(glm(as.formula(paste("istroke ~ log2(hscrp_mgl)", i, sep = "+")),
family=binomial, data=df)),
confint(glm(as.formula(paste("istroke ~ log2(hscrp_mgl)", i, sep = "+")),
family=binomial, data=df)))))
}
我的结果变量是stroke(istroke,0或1)。我感兴趣的是生物标志物(hscrp_mgl)。我知道我在某个地方犯了一个根本性的错误。我在其他SO帖子中寻找,但大多数人都不想按顺序和累积调整回归模型。
如果这是重复的,请告诉我,但是如果有什么不清楚的话。
修改
我的原始数据集df包含df1的所有变量,我的结果变量,然后是一些变量。这是一个可重复的样本:
age_cat is_female smoking_category bmi_calc has_diabetes sbp_mean istroke
(59,69] 0 4 19.6 0 103.5 0
(59,69] 1 1 19.1 0 138 0
(29,59] 0 4 26.8 0 155.5 0
(29,59] 0 1 23.1 0 130 1
(29,59] 1 1 22.7 0 126 1
(59,69] 0 4 25 0 182.5 0
(29,59] 1 1 20 0 96 1
(29,59] 1 2 23.9 0 134.5 0
(59,69] 0 4 24.4 0 160.5 1
修改 一个更可重复的例子:
df <- data.frame(age = c(50, 60, 50, 40, 70, 90, 30),
gender = c(0, 1, 1, 0, 1, 1, 1),
smoke = c(4, 3, 2, 1, 4, 3, 4),
BMI = c(19, 20, 21, 22, 23, 24, 25),
SBP = c(100, 120, 140, 110, 120, 130, 120),
diab = c(0, 1, 1, 1, 0, 1, 1),
stroke = c(0, 1, 0, 0, 1, 1, 1))
dput(df)
structure(list(age = c(50, 60, 50, 40, 70, 90, 30), gender = c(0,
1, 1, 0, 1, 1, 1), smoke = c(4, 3, 2, 1, 4, 3, 4), BMI = c(19,
20, 21, 22, 23, 24, 25), SBP = c(100, 120, 140, 110, 120, 130,
120), diab = c(0, 1, 1, 1, 0, 1, 1), stroke = c(0, 1, 0, 0, 1,
1, 1)), .Names = c("age", "gender", "smoke", "BMI", "SBP", "diab",
"stroke"), row.names = c(NA, -7L), class = "data.frame")
答案 0 :(得分:0)
我没有hscrp_mgl
的数据框来重现结果并确保它与您想要的相同,但您可以尝试以下方法:
获取您希望在迭代中使用的所有功能的名称:
x <- setdiff(names(df), "stroke")
使用purrr::map
创建一个包含功能名称的第一列的数据框,并使用purrr::map
来改变您想要的值。
library(purrr)
model <- data_frame(Model = x) %>%
mutate(OR = map(Model, ~coef(glm(as.formula(paste("stroke ~ log2(hscrp_mgl)", .x, sep = "+")),
family=binomial, data=df))),
CI = map(Model, ~confint(glm(as.formula(paste("stroke ~ log2(hscrp_mgl)", .x, sep = "+")),
family=binomial, data=df)))
你会得到这样的话:
# A tibble: 6 × 3
Model OR CI
<chr> <list> <list>
1 age <dbl [3]> <dbl [3 × 2]>
2 gender <dbl [3]> <dbl [3 × 2]>
3 smoke <dbl [3]> <dbl [3 × 2]>
4 BMI <dbl [3]> <dbl [3 × 2]>
5 SBP <dbl [3]> <dbl [3 × 2]>
6 diab <dbl [3]> <dbl [3 × 2]>
使用Purrr::map
和broom
您还可以使用broom
函数从模型中提取所需的数据,如下所示:
tidy
获取coef并变异并添加OR
列confint_tidy
并添加CI
列model2 <- data_frame(Model = x) %>%
mutate(model_details = map(Model, ~glm(as.formula(paste("stroke ~ log2(hscrp_mgl)", .x, sep = "+")),
family=binomial, data=df))) %>%
mutate(OR = map(model_details, broom::tidy),
CI = map(model_details, broom::confint_tidy))
累积调整
对于累积调整,您可以尝试以下操作:
model <- data_frame(Model = cnames) %>%
mutate(Model_adjust = map2_chr(Model, seq_along(Model), ~paste(cnames[1:.y], collapse = "+"))) %>%
mutate(model_details = map(Model_adjust, ~glm(as.formula(paste("stroke ~ log2(hscrp_mgl)", .x, sep = "+")),
family=binomial, data=df))) %>%
mutate(OR = map(model_details, broom::tidy),
CI = map(model_details, broom::confint_tidy))
附加步骤添加了包含变量的列,然后以下步骤使用Model_adjust来拟合模型:
model <- data_frame(Model = cnames) %>%
mutate(Model_adjust = map2_chr(Model, seq_along(Model), ~paste(cnames[1:.y], collapse = "+")))
# A tibble: 6 × 2
Model Model_adjust
<chr> <chr>
1 age age
2 gender age+gender
3 smoke age+gender+smoke
4 BMI age+gender+smoke+BMI
5 SBP age+gender+smoke+BMI+SBP
6 diab age+gender+smoke+BMI+SBP+diab
答案 1 :(得分:0)
实际上,lapply
可能是for
更好的方法,因为它可以返回最终行绑定的data.frames集合,而不是迭代地扩展模型 。
下面的示例随机化 hscrp_mgl ,因为它不在发布的数据中。所以忽略结果但考虑过程。另外,置信区间在不同列中的低和高之间分配。
set.seed(456)
df <- data.frame(hscrp_mgl = abs(rnorm(250)),
age = sample(100, 1000, replace=TRUE),
gender = sample(0:1, 1000, replace=TRUE),
smoke = sample(1:4, 1000, replace=TRUE),
BMI = sample(19:25, 1000, replace=TRUE),
SBP = sample(c(100, 120, 140, 110, 120, 130, 120),
1000, replace=TRUE),
diab = sample(0:1, 1000, replace=TRUE),
stroke = sample(0:1, 1000, replace=TRUE))
# ITERATE THROUGH COLUMN NUMBERS (SUBSETTING OUT FIRST AND LAST)
modeldfs <- lapply(seq_along(df)[3:ncol(df)-1], function(i) {
strf <- paste("stroke ~ log2(hscrp_mgl)",
paste(names(df)[2:i], collapse = "+"), sep = "+")
print(strf)
# FIT DYNAMIC CUMULATIVE FORMULA USING names() TO PASS IN COLUMN NAME
fit <- glm(as.formula(strf), family=binomial, data=df)
# BIND MODEL STATS
data.frame(OR = exp(coef(fit)[i+1]),
CI_2.5 = exp(confint(fit)[i+1,1]),
CI_97.5 = exp(confint(fit)[i+1,2]))
})
model <- do.call(rbind, modeldfs)
model
输出
[1] "stroke ~ log2(hscrp_mgl)+age"
# Waiting for profiling to be done...
# Waiting for profiling to be done...
[1] "stroke ~ log2(hscrp_mgl)+age+gender"
# Waiting for profiling to be done...
# Waiting for profiling to be done...
[1] "stroke ~ log2(hscrp_mgl)+age+gender+smoke"
# Waiting for profiling to be done...
# Waiting for profiling to be done...
[1] "stroke ~ log2(hscrp_mgl)+age+gender+smoke+BMI"
# Waiting for profiling to be done...
# Waiting for profiling to be done...
[1] "stroke ~ log2(hscrp_mgl)+age+gender+smoke+BMI+SBP"
# Waiting for profiling to be done...
# Waiting for profiling to be done...
[1] "stroke ~ log2(hscrp_mgl)+age+gender+smoke+BMI+SBP+diab"
# Waiting for profiling to be done...
# Waiting for profiling to be done...
# > model <- do.call(rbind, modeldfs)
# > model
OR CI_2.5 CI_97.5
age 1.003285 0.9989043 1.007701
gender 1.067117 0.8318796 1.369055
smoke 1.005926 0.9005196 1.123717
BMI 1.011281 0.9505659 1.075928
SBP 1.003252 0.9929368 1.013692
diab 1.139586 0.8880643 1.462925