如何使用tidyverse进行从长到宽的重塑

时间:2019-01-19 08:45:49

标签: r tidyverse tidyr

我想用tidyverse :: spread将长数据框重整为宽格式。我应该怎么做才能包含所有要转换的变量。

我的输入:

df <- tibble::tribble(
  ~respid, ~member_id, ~gender, ~edu,  ~dob,
     100L,  1L,      1L,   3L, 1978L,
     100L,  2L,      1L,   3L, 1980L,
     200L,  1L,      1L,   4L, 1974L,
     200L,  2L,      2L,   5L, 1955L,
     300L,  1L,      2L,   3L, 1998L,
     300L,  2L,      1L,   4L, 1999L,
     300L,  3L,      2L,   3L, 2001L
  )

期望输出:

output <- tibble::tribble(
  ~respid, ~gender_1, ~edu_1, ~dob_1, ~gender_2, ~edu_2, ~dob_2, ~gender_3, ~edu_3, ~dob_3,
     100L,        1L,     3L,  1978L,        1L,     3L,  1980L,        NA,     NA,     NA,
     200L,        1L,     4L,  1974L,        2L,     5L,  1955L,        NA,     NA,     NA,
     300L,        2L,     3L,  1998L,        1L,     4L,  1999L,        2L,     3L,  2001L
  )

我在这里尝试制作,但是row_number()看起来不正确。

df %>%
  group_by(member_id) %>% 
  mutate(t1 = paste0("gender_" , row_number())) %>%
  spread(t1, gender)

1 个答案:

答案 0 :(得分:3)

您可以这样做:

df %>%
 gather(var, val, -c(respid, member_id)) %>%
 mutate(var = paste(var, member_id, sep = "_")) %>%
 select(-member_id) %>%
 spread(var, val)

  respid dob_1 dob_2 dob_3 edu_1 edu_2 edu_3 gender_1 gender_2 gender_3
   <int> <int> <int> <int> <int> <int> <int>    <int>    <int>    <int>
1    100  1978  1980    NA     3     3    NA        1        1       NA
2    200  1974  1955    NA     4     5    NA        1        2       NA
3    300  1998  1999  2001     3     4     3        2        1        2

首先,它将数据从宽格式转换为长格式。其次,它创建新的变量名。最后,它将其恢复为宽格式。

或使用reshape2

dcast(melt(df, id.vars = c("respid", "member_id")), respid~variable+member_id, value.var = "value")

  respid gender_1 gender_2 gender_3 edu_1 edu_2 edu_3 dob_1 dob_2 dob_3
1    100        1        1       NA     3     3    NA  1978  1980    NA
2    200        1        2       NA     4     5    NA  1974  1955    NA
3    300        2        1        2     3     4     3  1998  1999  2001