我有一个数据集,我希望在一个新的数据框中显示汇总结果。到目前为止,我已经创建了前两列,它们都是唯一ID和该ID的唯一案例编号的数量。现在,我正在寻找创建其他列,以将每个案例编号的“代码”显示为1st case,2nd case等列。逻辑是列将显示与案例编号对应的代码,如果案例编号相同,则它将首先显示最早的日期,然后在其后的列中显示较晚的日期。然后另一个“代码”用于相同ID之后的不同案例编号。任何帮助将不胜感激,谢谢,我无法弄清楚该怎么做!
所需结果:
ID cases.unique 1st Case 2nd Case 3rd Case 4th Case
1 100 1 715.10 724.50
2 200 2 717.00 300.02 366.90 444.22
3 300 1 717.00
4 400 1 465.80 785.00
5 500 1 309.00
数据:
x <- data.frame("ID" = c(100, 100, 200, 200, 200, 200, 300, 400, 400, 500),
"Case Number" = c(1111, 1111, 1000, 1000, 1001, 1001, 9999, 1422, 1422, 1522),
"Date" = c("2013/07/15", "2013/09/23", "2016/06/21", "2016/09/18", "2016/10/20", "2016/08/06", "2017/08/21", "2016/08/23", "2016/08/24","2016/08/14"),
"Code" = c(715.1, 724.5,717,366.9,444.22,300.02,717,465.8,785,309.0))
到目前为止我所拥有的:
x2 <- x %>%
group_by(ID) %>%
summarize(
cases.unique = n_distinct(Case.Number)
)
答案 0 :(得分:2)
尝试:
library(tidyverse)
x %>%
group_by(ID) %>%
arrange(Date = as.Date(Date, "%Y/%m/%d")) %>%
mutate(cases.unique = n_distinct(Case.Number),
cnmbr = paste0("Case ", row_number())) %>%
distinct(ID, cases.unique, cnmbr, Code) %>%
spread(cnmbr, Code)
输出:
# A tibble: 5 x 6
# Groups: ID [5]
ID cases.unique `Case 1` `Case 2` `Case 3` `Case 4`
<dbl> <int> <dbl> <dbl> <dbl> <dbl>
1 100 1 715. 724. NA NA
2 200 2 717 300. 367. 444.
3 300 1 717 NA NA NA
4 400 1 466. 785 NA NA
5 500 1 309 NA NA NA
答案 1 :(得分:1)
这是一种方法:
library(tidyverse)
x <- data.frame(
ID = c(100, 100, 200, 200, 200, 200, 300, 400, 400, 500),
Case Number = c(1111, 1111, 1000, 1000, 1001, 1001, 9999, 1422, 1422, 1522),
Date = c("2013/07/15", "2013/09/23", "2016/06/21", "2016/09/18", "2016/10/20", "2016/08/06", "2017/08/21", "2016/08/23", "2016/08/24","2016/08/14"),
Code = c(715.1, 724.5,717,366.9,444.22,300.02,717,465.8,785,309.0)
)
x %>%
group_by(ID) %>%
mutate(
cases.unique = n_distinct(Case.Number),
case_label = paste0(row_number(), "_case")
) %>%
select(-Case.Number, -Date) %>%
spread(case_label, Code)
#> # A tibble: 5 x 6
#> # Groups: ID [5]
#> ID cases.unique `1_case` `2_case` `3_case` `4_case`
#> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 100 1 715. 724. NA NA
#> 2 200 2 717 367. 444. 300.
#> 3 300 1 717 NA NA NA
#> 4 400 1 466. 785 NA NA
#> 5 500 1 309 NA NA NA
由reprex package(v0.2.1)于2019-03-22创建