我有数据框(如下所示),其中包含在单次录取期间和不同录取期间接收不同诊断(DX)的病例(ID)。我想扩大这个数据框,以便每个单独的录入在单独的列中具有所有诊断。我尝试了dplyr
spread
功能,但没有给出正确的结果。有什么建议吗?
ID DX Age Admitted
1 a 17 3/2/14
1 b 17 3/2/14
1 c 17 4/30/14
2 e 20 7/22/13
2 a 20 7/22/13
2 c 20 7/22/13
2 d 20 2/4/14
3 b 16 4/18/14
4 e 16 10/8/13
4 m 16 10/8/13
预期输出如下:
ID DX1 DX2 DX3 Age Admitted
1 a b NA 17 3/2/14
1 c NA NA 17 4/30/14
2 e a c 20 7/22/13
2 d NA NA 20 2/4/14
3 b NA NA 16 4/18/14
4 e m NA 16 10/8/13
答案 0 :(得分:0)
可能有帮助
df1$ind <- with(df1, paste0('DX',ave(seq_along(ID),
ID, Admitted, FUN=seq_along)))
library(reshape2)
dcast(df1, ...~ind, value.var='DX')
# ID Age Admitted DX1 DX2 DX3
#1 1 17 3/2/14 a b <NA>
#2 1 17 4/30/14 c <NA> <NA>
#3 2 20 2/4/14 d <NA> <NA>
#4 2 20 7/22/13 e a c
#5 3 16 4/18/14 b <NA> <NA>
#6 4 16 10/8/13 e m <NA>
或者
library(dplyr)
library(tidyr)
df1 %>%
group_by(ID, Admitted) %>%
mutate(ind=paste0('DX', 1:n())) %>%
ungroup() %>%
spread(ind, DX)
# ID Age Admitted DX1 DX2 DX3
#1 1 17 3/2/14 a b NA
#2 1 17 4/30/14 c NA NA
#3 2 20 2/4/14 d NA NA
#4 2 20 7/22/13 e a c
#5 3 16 4/18/14 b NA NA
#6 4 16 10/8/13 e m NA
df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 4L, 4L),
DX = c("a", "b", "c", "e", "a", "c", "d", "b", "e", "m"),
Age = c(17L, 17L, 17L, 20L, 20L, 20L, 20L, 16L, 16L, 16L),
Admitted = c("3/2/14", "3/2/14", "4/30/14", "7/22/13", "7/22/13",
"7/22/13", "2/4/14", "4/18/14", "10/8/13", "10/8/13")),
.Names = c("ID",
"DX", "Age", "Admitted"), class = "data.frame", row.names = c(NA,
-10L))