我的数据:
library(tidyverse)
library(lme4)
myData <- structure(list(Subjects = c(4L, 3L, 5L, 1L, 9L, 6L, 10L, 2L,
8L, 7L), Gene1 = c(0.318630087617032, -0.58179068471591, 0.714532710891568,
-0.825259425862769, -0.359862131395465, 0.0898861437775305, 0.0962744602851301,
-0.201633952183354, 0.739840499878431, 0.123379501088869), Variant1 = c(1L,
0L, 1L, 2L, 2L, 1L, 0L, 1L, 2L, 0L), Variant2 = c(0L, 0L, 2L,
2L, 0L, 2L, 2L, 2L, 2L, 0L), Variant3 = c(1L, 1L, 0L, 2L, 0L,
1L, 1L, 1L, 2L, 1L), Variant4 = c(1L, 2L, 1L, 0L, 0L, 1L, 0L,
2L, 1L, 1L), Age = c(81L, 60L, 85L, 87L, 76L, 78L, 88L, 64L,
90L, 75L), Sex = c(0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L), RIN = c(6L,
6L, 8L, 6L, 8L, 7L, 8L, 7L, 7L, 6L), ABG = structure(c(4L, 5L,
8L, 3L, 6L, 2L, 3L, 4L, 7L, 1L), .Label = c("F1", "F10", "F2",
"F3", "F4", "F5", "F6", "F8"), class = "factor")), row.names = c(NA,
-10L), class = "data.frame", .Names = c("Subjects", "Gene1",
"Variant1", "Variant2", "Variant3", "Variant4", "Age", "Sex",
"RIN", "ABG"))
myData
Subjects Gene1 Variant1 Variant2 Variant3 Variant4 Age Sex RIN ABG
1 4 0.31863009 1 0 1 1 81 0 6 F3
2 3 -0.58179068 0 0 1 2 60 1 6 F4
3 5 0.71453271 1 2 0 1 85 0 8 F8
4 1 -0.82525943 2 2 2 0 87 1 6 F2
5 9 -0.35986213 2 0 0 0 76 0 8 F5
6 6 0.08988614 1 2 1 1 78 1 7 F10
7 10 0.09627446 0 2 1 0 88 0 8 F2
8 2 -0.20163395 1 2 1 2 64 0 7 F3
9 8 0.73984050 2 2 2 1 90 1 7 F6
10 7 0.12337950 0 0 1 1 75 1 6 F1
Gene1
是我的因变量,Variant1
,Variant2
,Variant3
和Variant4
是我的自变量。 Age
,Sex
,RIN
和ABG
是我的协变量。我想使用tidyverse框架(broom / dplyr / purrr / map)来迭代Variant1:Variant4
,并使用线性混合模型执行以下线性回归:
lmer(Gene1~Variant1+Age+Sex+RIN+(1|ABG), myData)
代表 Variant1 ,
lmer(Gene1~Variant2+Age+Sex+RIN+(1|ABG), myData)
表示 Variant2 ,依此类推...
最后,我想生成一个结果表,其中包含所有Variant *的beta系数(估计),Std.Err和pValue(可能使用整洁/增添/扫视??)。
PS。 Variant *的数量可能会有所不同。
谢谢!
答案 0 :(得分:3)
对于您的问题,您可以gather()
以在所有不同种类的变体上使用group_split()
。从那时起,我们可以遍历每个拆分的data.frame并运行线性模型。在map()
内部,我们将broom::tidy()
数据并添加一列以区分每个模型的估计。我使用map_df()
结束单个数据帧,但也可以只使用map()
结束一个data.frames列表。
library(tidyverse)
library(lme4)
library(broom)
dat <- tribble(~Subjects, ~Gene1, ~Variant1, ~Variant2, ~Variant3, ~Variant4, ~Age, ~Sex, ~RIN, ~ABG,
1, -0.82525943, 2, 2, 2, 0, 87, 1, 6, "F2",
2, -0.20163395, 1, 2, 1, 2, 64 , 0 , 7, "F3",
3, -0.58179068, 0, 0, 1, 2, 60 , 1 , 6, "F4",
4, 0.31863009, 1, 0, 1, 1, 81 , 0 , 6, "F3",
5, 0.71453271, 1, 2, 0, 1, 85 , 0 , 8, "F8",
6, 0.08988614, 1, 2, 1, 1, 78 , 1 , 7, "F10",
7, 0.12337950, 0, 0, 1, 1, 75 , 1 , 6, "F1",
8, 0.73984050, 2, 2, 2, 1, 90 , 1 , 7, "F6",
9, -0.35986213, 2, 0, 0, 0, 76 , 0 , 8, "F5",
10, 0.09627446, 0, 2, 1, 0, 88, 0, 8, "F2")
dat %>%
gather(key = "variants", value = "var_value", Variant1:Variant4) %>%
group_split(variants) %>%
map_df(~lmer(Gene1~var_value+Age+Sex+RIN+(1|ABG), data = .x) %>%
tidy() %>%
mutate(variant_group = unique(.x$variants)))
#> # A tibble: 28 x 6
#> term estimate std.error statistic group variant_group
#> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 (Intercept) -3.91 1.96 -2.00 fixed Variant1
#> 2 var_value -0.280 0.120 -2.34 fixed Variant1
#> 3 Age 0.0401 0.00905 4.43 fixed Variant1
#> 4 Sex 0.00107 0.391 0.00274 fixed Variant1
#> 5 RIN 0.161 0.154 1.05 fixed Variant1
#> 6 sd_(Intercept).ABG 0.462 NA NA ABG Variant1
#> 7 sd_Observation.Resid… 0.00234 NA NA Residu… Variant1
#> 8 (Intercept) -3.60 2.74 -1.31 fixed Variant2
#> 9 var_value -0.0625 0.202 -0.309 fixed Variant2
#> 10 Age 0.0295 0.0175 1.69 fixed Variant2
#> # … with 18 more rows
由reprex package(v0.2.1)于2019-02-23创建