如何使用tidyverse框架迭代自变量,执行多个线性回归?

时间:2019-02-22 20:46:19

标签: r dplyr purrr genetics broom

我的数据:

library(tidyverse)
library(lme4)

myData <- structure(list(Subjects = c(4L, 3L, 5L, 1L, 9L, 6L, 10L, 2L, 
8L, 7L), Gene1 = c(0.318630087617032, -0.58179068471591, 0.714532710891568, 
-0.825259425862769, -0.359862131395465, 0.0898861437775305, 0.0962744602851301, 
-0.201633952183354, 0.739840499878431, 0.123379501088869), Variant1 = c(1L, 
0L, 1L, 2L, 2L, 1L, 0L, 1L, 2L, 0L), Variant2 = c(0L, 0L, 2L, 
2L, 0L, 2L, 2L, 2L, 2L, 0L), Variant3 = c(1L, 1L, 0L, 2L, 0L, 
1L, 1L, 1L, 2L, 1L), Variant4 = c(1L, 2L, 1L, 0L, 0L, 1L, 0L, 
2L, 1L, 1L), Age = c(81L, 60L, 85L, 87L, 76L, 78L, 88L, 64L, 
90L, 75L), Sex = c(0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L), RIN = c(6L, 
6L, 8L, 6L, 8L, 7L, 8L, 7L, 7L, 6L), ABG = structure(c(4L, 5L, 
8L, 3L, 6L, 2L, 3L, 4L, 7L, 1L), .Label = c("F1", "F10", "F2", 
"F3", "F4", "F5", "F6", "F8"), class = "factor")), row.names = c(NA, 
-10L), class = "data.frame", .Names = c("Subjects", "Gene1", 
"Variant1", "Variant2", "Variant3", "Variant4", "Age", "Sex", 
"RIN", "ABG"))

myData
   Subjects       Gene1 Variant1 Variant2 Variant3 Variant4 Age Sex RIN ABG
1         4  0.31863009        1        0        1        1  81   0   6  F3
2         3 -0.58179068        0        0        1        2  60   1   6  F4
3         5  0.71453271        1        2        0        1  85   0   8  F8
4         1 -0.82525943        2        2        2        0  87   1   6  F2
5         9 -0.35986213        2        0        0        0  76   0   8  F5
6         6  0.08988614        1        2        1        1  78   1   7 F10
7        10  0.09627446        0        2        1        0  88   0   8  F2
8         2 -0.20163395        1        2        1        2  64   0   7  F3
9         8  0.73984050        2        2        2        1  90   1   7  F6
10        7  0.12337950        0        0        1        1  75   1   6  F1

Gene1是我的因变量,Variant1Variant2Variant3Variant4是我的自变量。 AgeSexRINABG是我的协变量。我想使用tidyverse框架(broom / dplyr / purrr / map)来迭代Variant1:Variant4,并使用线性混合模型执行以下线性回归:

lmer(Gene1~Variant1+Age+Sex+RIN+(1|ABG), myData)代表 Variant1
lmer(Gene1~Variant2+Age+Sex+RIN+(1|ABG), myData)表示 Variant2 ,依此类推...

最后,我想生成一个结果表,其中包含所有Variant *的beta系数(估计),Std.Err和pValue(可能使用整洁/增添/扫视??)。

PS。 Variant *的数量可能会有所不同。

谢谢!

1 个答案:

答案 0 :(得分:3)

对于您的问题,您可以gather()以在所有不同种类的变体上使用group_split()。从那时起,我们可以遍历每个拆分的data.frame并运行线性模型。在map()内部,我们将broom::tidy()数据并添加一列以区分每个模型的估计。我使用map_df()结束单个数据帧,但也可以只使用map()结束一个data.frames列表。

library(tidyverse)
library(lme4)
library(broom)

dat <- tribble(~Subjects, ~Gene1, ~Variant1, ~Variant2, ~Variant3, ~Variant4, ~Age, ~Sex, ~RIN, ~ABG,
        1,     -0.82525943,  2,      2,       2,       0,     87,   1,   6,     "F2",
        2,     -0.20163395,  1,      2,       1,       2, 64  , 0 ,  7,       "F3",
        3,     -0.58179068,  0,      0,       1,       2, 60  , 1 ,  6,       "F4",
        4,      0.31863009,  1,      0,       1,       1, 81  , 0 ,  6,       "F3",
        5,      0.71453271,  1,      2,       0,       1, 85  , 0 ,  8,       "F8",
        6,      0.08988614,  1,      2,       1,       1, 78  , 1 ,  7,      "F10",
        7,      0.12337950,  0,      0,       1,       1, 75  , 1 ,  6,       "F1",
        8,      0.73984050,  2,      2,       2,       1, 90  , 1 ,  7,       "F6",
        9,     -0.35986213,  2,      0,       0,       0, 76  , 0 ,  8,       "F5",
        10,     0.09627446,  0,      2,       1,       0,  88,   0,   8,       "F2")

dat %>% 
  gather(key = "variants", value = "var_value", Variant1:Variant4) %>% 
  group_split(variants) %>% 
  map_df(~lmer(Gene1~var_value+Age+Sex+RIN+(1|ABG), data = .x) %>% 
        tidy() %>% 
        mutate(variant_group = unique(.x$variants)))
#> # A tibble: 28 x 6
#>    term                  estimate std.error statistic group   variant_group
#>    <chr>                    <dbl>     <dbl>     <dbl> <chr>   <chr>        
#>  1 (Intercept)           -3.91      1.96     -2.00    fixed   Variant1     
#>  2 var_value             -0.280     0.120    -2.34    fixed   Variant1     
#>  3 Age                    0.0401    0.00905   4.43    fixed   Variant1     
#>  4 Sex                    0.00107   0.391     0.00274 fixed   Variant1     
#>  5 RIN                    0.161     0.154     1.05    fixed   Variant1     
#>  6 sd_(Intercept).ABG     0.462    NA        NA       ABG     Variant1     
#>  7 sd_Observation.Resid…  0.00234  NA        NA       Residu… Variant1     
#>  8 (Intercept)           -3.60      2.74     -1.31    fixed   Variant2     
#>  9 var_value             -0.0625    0.202    -0.309   fixed   Variant2     
#> 10 Age                    0.0295    0.0175    1.69    fixed   Variant2     
#> # … with 18 more rows

reprex package(v0.2.1)于2019-02-23创建