使用嵌套的重复度量收集多个列

时间:2018-03-28 18:28:13

标签: r tidyverse

我有一个不同类型的人(pid)的数据集(type2=c("dad", "mom", "kid";并且为了方便,type=c("a", "b", "c"))嵌套在家庭(hid)中并重复测量( time)。

  • 向所有人询问某些变量,例如v1_,但这些值分布在三列中。例如,v1_a包含所有父亲(type==a)的值。
  • v2_之类的变量只询问爸爸和妈妈(a和b),而且这些值分布在两列中。
  • v3等变量也只向爸爸和妈妈询问,但这些值包含在一列中。
  • 向所有人询问v4等变量,这些值包含在一列中。

有:

   hid pid type type2 time v1_a v1_b v1_c v2_a v2_b v3 v4
1    1   1    a   dad    1    6   NA   NA    2   NA  4  3
2    1   2    b   mom    1   NA    2   NA   NA    5  6  6
3    1   3    c   kid    1   NA   NA    1   NA   NA NA  5
4    2   4    a   dad    1    3   NA   NA    6   NA  2  6
5    2   5    b   mom    1   NA    5   NA   NA    2  4  3
6    2   6    c   kid    1   NA   NA    3   NA   NA NA  5
7    1   1    a   dad    2    3   NA   NA    2   NA  4  3
8    1   2    b   mom    2   NA    3   NA   NA    5  6  6
9    1   3    c   kid    2   NA   NA    2   NA   NA NA  5
10   2   4    a   dad    2    2   NA   NA    6   NA  2  6
11   2   5    b   mom    2   NA    3   NA   NA    2  4  3
12   2   6    c   kid    2   NA   NA    2   NA   NA NA  5

这是我想要的最终结果:

   hid pid type type2 time v1 v2 v3 v4
1    1   1    a   dad    1  6  2  4  3
2    1   2    b   mom    1  2  5  6  6
3    1   3    c   kid    1  1 NA NA  5
4    2   4    a   dad    1  3  6  2  6
5    2   5    b   mom    1  5  2  4  3
6    2   6    c   kid    1  3 NA NA  5
7    1   1    a   dad    2  3  2  4  3
8    1   2    b   mom    2  3  5  6  6
9    1   3    c   kid    2  2 NA NA  5
10   2   4    a   dad    2  2  6  2  6
11   2   5    b   mom    2  3  2  4  3
12   2   6    c   kid    2  2 NA NA  5

我正在寻找一种tidyverse方法来处理混合变量的更大实际用例,如下所示。变量命名是一致的。我在gather()之后去哪儿?

library(tidyverse)
df_have <- data.frame(hid=c(1, 1, 1, 2, 2, 2,
                            1, 1, 1, 2, 2, 2),
                      pid=c(1, 2, 3, 4, 5, 6,
                            1, 2, 3, 4, 5, 6),
                      type=c("a", "b", "c", "a", "b", "c",
                             "a", "b", "c", "a", "b", "c"),
                      type2=c("dad", "mom", "kid", "dad", "mom", "kid",
                              "dad", "mom", "kid", "dad", "mom", "kid"),
                      time=c(1, 1, 1, 1, 1, 1, 
                             2, 2, 2, 2, 2, 2),
                      v1_a=c(6, NA, NA, 3, NA, NA,
                             3, NA, NA, 2, NA, NA),
                      v1_b=c(NA, 2, NA, NA, 5, NA,
                             NA, 3, NA, NA, 3, NA),
                      v1_c=c(NA, NA, 1, NA, NA, 3,
                             NA, NA, 2, NA, NA, 2),
                      v2_a=c(2, NA, NA, 6, NA, NA,
                             2, NA, NA, 6, NA, NA),
                      v2_b=c(NA, 5, NA, NA, 2, NA,
                             NA, 5, NA, NA, 2, NA),
                      v3=c(4, 6, NA, 2, 4, NA,
                           4, 6, NA, 2, 4, NA),
                      v4=c(3, 6, 5, 6, 3, 5,
                           3, 6, 5, 6, 3, 5)
                      )
df_want <- data.frame(hid=c(1, 1, 1, 2, 2, 2,
                            1, 1, 1, 2, 2, 2),
                      pid=c(1, 2, 3, 4, 5, 6,
                            1, 2, 3, 4, 5, 6),
                      type=c("a", "b", "c", "a", "b", "c",
                             "a", "b", "c", "a", "b", "c"),
                      type2=c("dad", "mom", "kid", "dad", "mom", "kid",
                              "dad", "mom", "kid", "dad", "mom", "kid"),
                      time=c(1, 1, 1, 1, 1, 1, 
                             2, 2, 2, 2, 2, 2),
                      v1=c(6, 2, 1, 3, 5, 3,
                           3, 3, 2, 2, 3, 2),
                      v2=c(2, 5, NA, 6, 2, NA,
                           2, 5, NA, 6, 2, NA),
                      v3=c(4, 6, NA, 2, 4, NA,
                           4, 6, NA, 2, 4, NA),
                      v4=c(3, 6, 5, 6, 3, 5,
                           3, 6, 5, 6, 3, 5)
                      )

df_have %>%
  gather(key, value, -hid, -pid, -type, -type2, -time) 

2 个答案:

答案 0 :(得分:0)

这让我在那里,但filter(!is.na(value))步骤似乎是一个黑客。更好的想法?

df_test <- 
df_have %>%
  gather(key, value, -hid, -pid, -type, -time, -type2) %>%
  mutate(key = str_replace(key, "_.*", "")) %>%
  filter(!is.na(value)) %>%
  spread(key, value) %>%
  arrange(time, hid, type, pid)

从@www更新:

df_test <- 
df_have %>%
  gather(key, value, -hid, -pid, -type, -time, -type2, na.rm=TRUE) %>%
  mutate(key = str_replace(key, "_.*", "")) %>%
  spread(key, value) %>%
  arrange(time, hid, type, pid)

答案 1 :(得分:0)

使用来自coalesce的{​​{1}}和来自dplyr的{​​{1}}的{​​{1}}是另一个想法。

map