考虑此示例
mydata <- data_frame(ind_1 = c(NA,NA,3,4),
ind_2 = c(2,3,4,5),
ind_3 = c(5,6,NA,NA),
y = c(28,34,25,12),
group = c('a','a','b','b'))
> mydata
# A tibble: 4 x 5
ind_1 ind_2 ind_3 y group
<dbl> <dbl> <dbl> <dbl> <chr>
1 NA 2 5 28 a
2 NA 3 6 34 a
3 3 4 NA 25 b
4 4 5 NA 12 b
在这里,我想针对每个group
,对该组中不缺少的任何变量进行回归y
,并将对应的lm
对象存储在list-column
中。 / p>
也就是说:
a
,这些变量对应于ind_2
和ind_3
b
,它们对应于ind_1
和ind_2
我尝试了以下操作,但这不起作用
mydata %>% group_by(group) %>% nest() %>%
do(filtered_df <- . %>% select(which(colMeans(is.na(.)) == 0)),
myreg = lm(y~ names(filtered_df)))
有什么想法吗?谢谢!
答案 0 :(得分:8)
我们可以使用map
和mutate
。如果您要保留过滤后的数据(select
),我们可以nestdat1
一步一步地map
进行建模,也可以使用两个nestdat2
分步进行建模{<1} / p>
library(tidyverse)
nestdat1 <- mydata %>%
group_by(group) %>%
nest() %>%
mutate(model = data %>% map(~ select_if(., funs(!any(is.na(.)))) %>%
lm(y ~ ., data = .)))
nestdat2 <- mydata %>%
group_by(group) %>%
nest() %>%
mutate(data = data %>% map(~ select_if(., funs(!any(is.na(.))))),
model = data %>% map(~ lm(y ~ ., data = .)))
输出:
它们产生不同的data
列:
> nestdat1 %>% pull(data)
[[1]]
# A tibble: 2 x 4
ind_1 ind_2 ind_3 y
<dbl> <dbl> <dbl> <dbl>
1 NA 2 5 28
2 NA 3 6 34
[[2]]
# A tibble: 2 x 4
ind_1 ind_2 ind_3 y
<dbl> <dbl> <dbl> <dbl>
1 3 4 NA 25
2 4 5 NA 12
> nestdat2 %>% pull(data)
[[1]]
# A tibble: 2 x 3
ind_2 ind_3 y
<dbl> <dbl> <dbl>
1 2 5 28
2 3 6 34
[[2]]
# A tibble: 2 x 3
ind_1 ind_2 y
<dbl> <dbl> <dbl>
1 3 4 25
2 4 5 12
但同一列model
列:
> nestdat1 %>% pull(model)
[[1]]
Call:
lm(formula = y ~ ., data = .)
Coefficients:
(Intercept) ind_2 ind_3
16 6 NA
[[2]]
Call:
lm(formula = y ~ ., data = .)
Coefficients:
(Intercept) ind_1 ind_2
64 -13 NA
> nestdat2 %>% pull(model)
[[1]]
Call:
lm(formula = y ~ ., data = .)
Coefficients:
(Intercept) ind_2 ind_3
16 6 NA
[[2]]
Call:
lm(formula = y ~ ., data = .)
Coefficients:
(Intercept) ind_1 ind_2
64 -13 NA
答案 1 :(得分:2)
这是另一个tidyverse
选项,如果您希望将其保留在mydata$model
中,请分配给tibble
:
library(tidyverse)
mydata %>%
nest(-group) %>%
pull(data) %>%
map(~lm(y ~., discard(.,anyNA)))
# [[1]]
#
# Call:
# lm(formula = y ~ ., data = discard(., anyNA))
#
# Coefficients:
# (Intercept) ind_2 ind_3
# 16 6 NA
#
#
# [[2]]
#
# Call:
# lm(formula = y ~ ., data = discard(., anyNA))
#
# Coefficients:
# (Intercept) ind_1 ind_2
# 64 -13 NA
#
#