我正在尝试对多个变量运行多级模型。我有两个变量列表-结果和响应。我遇到的困难是设置lapply命令来并行运行列表的每个元素。因此,列表“结果”的第一个元素与列表“响应”的第一个元素同时运行。然后,对于列表“结果”的第二个元素,调用列表“响应”的第二个元素。
我已经能够组合两个lapply命令,但这给了我来自结果和响应列表的每个元素的每个组合,这并不理想。我可以手动提取所需的元素,但是还有另一种方法可以实现这一点吗?
#Define random intercept
random_intercepts <- "(1|clusters)"
#For each of the 4 outcome variables, define the response variables
water_imp_vars <- c("fish_factor", "num_childrenunder5", "quintile_nowashnomat_fac")
less5_vars <- c("fish_factor", "num_hh_members", "num_childrenunder5")
san_imp_vars <- c("fish_factor", "num_hh_members", "num_childrenunder5", "quintile_nowashnomat_fac")
housing_imp_vars <- c("fish_factor","num_hh_members", "num_childrenunder5", "quintile_nowashnomat_fac")
#Combine all repsonse variables into 1 list
all_response <- list(water_imp_vars, less5_vars, san_imp_vars, housing_imp_vars)
#List of outcomes
outcomes <- c("water_imp", "less_than_5", "san_imp", "housing_imp")
all_models <- lapply(setNames(outcomes, outcomes), function(var) {
lapply(all_response, function(var2) {
fixed <- paste0(var2, collapse= "+")
formula <- as.formula(paste(var, "~", fixed, "+", random_intercepts))
glmer(formula, hr_analysis_dataset, family='binomial', nAGQ = 0)
})
})
这是all_models变量的一些输出。
$water_imp
$water_imp[[1]]
water_imp ~ fish_factor + num_childrenunder5 + quintile_nowashnomat_fac +
(1 | clusters)
<environment: 0x0000000017ecdd58>
$water_imp[[2]]
water_imp ~ fish_factor + num_hh_members + num_childrenunder5 +
(1 | clusters)
<environment: 0x0000000017ed1858>
$water_imp[[3]]
water_imp ~ fish_factor + num_hh_members + num_childrenunder5 +
quintile_nowashnomat_fac + (1 | clusters)
<environment: 0x0000000017ed5f20>
$water_imp[[4]]
water_imp ~ fish_factor + num_hh_members + num_childrenunder5 +
quintile_nowashnomat_fac + (1 | clusters)
<environment: 0x0000000017ed86a0>
我只对第一个结果变量的第一个组合感兴趣–
water_imp ~ fish_factor + num_childrenunder5 + quintile_nowashnomat_fac +
(1 | clusters)
然后是第二个结果变量,我对第二个组合感兴趣
less_than_5 ~ fish_factor + num_hh_members + num_childrenunder5 +
(1 | clusters)
尽管提取正确组合的工作量不大,但我还计划在多个国家/地区进行此分析,因此一旦添加更多级别,此问题将继续增长。
任何帮助将不胜感激
答案 0 :(得分:1)
如果我理解正确,那么您想要四个模型并并行运行吗?公式部分很简单:
results <- lapply(1:4, function(i){
fixed <- paste0(unlist(all_response[i]), collapse= " + ")
formula <- as.formula(paste(outcomes[i], "~", fixed, "+", random_intercepts))
print(formula)
glmer(formula, hr_analysis_dataset, family='binomial', nAGQ = 0)
})
但是这些不会并行运行。为此,您需要使用并行的lapply函数,例如:
library(parallel)
# Calculate the number of cores
no_cores <- detectCores() - 1
# Initiate cluster
cl <- makeCluster(no_cores, type="FORK")
results <- parLapply(cl, 1:4, function(i){
fixed <- paste0(unlist(all_response[i]), collapse= " + ")
formula <- as.formula(paste(outcomes[i], "~", fixed, "+", random_intercepts))
print(formula)
glmer(formula, hr_analysis_dataset, family='binomial', nAGQ = 0)
})
stopCluster(cl)
请注意,这仅适用于可以使用FORK的基于Linux的系统。否则,如果在Windows上,则必须使用PSOCK。有关R中的并行lapply的更详细概述,请参见here。