我想用tidyverse :: spread将长数据框重整为宽格式。我应该怎么做才能包含所有要转换的变量。
我的输入:
df <- tibble::tribble(
~respid, ~member_id, ~gender, ~edu, ~dob,
100L, 1L, 1L, 3L, 1978L,
100L, 2L, 1L, 3L, 1980L,
200L, 1L, 1L, 4L, 1974L,
200L, 2L, 2L, 5L, 1955L,
300L, 1L, 2L, 3L, 1998L,
300L, 2L, 1L, 4L, 1999L,
300L, 3L, 2L, 3L, 2001L
)
期望输出:
output <- tibble::tribble(
~respid, ~gender_1, ~edu_1, ~dob_1, ~gender_2, ~edu_2, ~dob_2, ~gender_3, ~edu_3, ~dob_3,
100L, 1L, 3L, 1978L, 1L, 3L, 1980L, NA, NA, NA,
200L, 1L, 4L, 1974L, 2L, 5L, 1955L, NA, NA, NA,
300L, 2L, 3L, 1998L, 1L, 4L, 1999L, 2L, 3L, 2001L
)
我在这里尝试制作,但是row_number()看起来不正确。
df %>%
group_by(member_id) %>%
mutate(t1 = paste0("gender_" , row_number())) %>%
spread(t1, gender)
答案 0 :(得分:3)
您可以这样做:
df %>%
gather(var, val, -c(respid, member_id)) %>%
mutate(var = paste(var, member_id, sep = "_")) %>%
select(-member_id) %>%
spread(var, val)
respid dob_1 dob_2 dob_3 edu_1 edu_2 edu_3 gender_1 gender_2 gender_3
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 100 1978 1980 NA 3 3 NA 1 1 NA
2 200 1974 1955 NA 4 5 NA 1 2 NA
3 300 1998 1999 2001 3 4 3 2 1 2
首先,它将数据从宽格式转换为长格式。其次,它创建新的变量名。最后,它将其恢复为宽格式。
或使用reshape2
:
dcast(melt(df, id.vars = c("respid", "member_id")), respid~variable+member_id, value.var = "value")
respid gender_1 gender_2 gender_3 edu_1 edu_2 edu_3 dob_1 dob_2 dob_3
1 100 1 1 NA 3 3 NA 1978 1980 NA
2 200 1 2 NA 4 5 NA 1974 1955 NA
3 300 2 1 2 3 4 3 1998 1999 2001