队列分析表与相对周数

时间:2015-09-22 18:30:24

标签: r

我的表格类似于:

dt <- data.table(id=c("123","123","234","234","345","345","345","456","456"),
                 treatment=c("control","control","variable","variable","control","control","control","control","control"),
                 cohort=c("2015-08-10","2015-08-10","2015-08-10","2015-08-10","2015-08-17","2015-08-17","2015-08-17","2015-08-17","2015-08-17"),
                 visit_date=c("2015-08-10", "2015-08-11","2015-08-12", "2015-08-18","2015-08-19","2015-08-31","2015-09-01","2015-08-19","2015-08-27"),
                 visit_week=c("2015-08-10", "2015-08-10","2015-08-10", "2015-08-17","2015-08-17","2015-08-31","2015-08-31","2015-08-17","2015-08-24"))
> dt
    id treatment     cohort visit_date visit_week
1: 123   control 2015-08-10 2015-08-10 2015-08-10
2: 123   control 2015-08-10 2015-08-11 2015-08-10
3: 234  variable 2015-08-10 2015-08-12 2015-08-10
4: 234  variable 2015-08-10 2015-08-18 2015-08-17
5: 345   control 2015-08-17 2015-08-19 2015-08-17
6: 345   control 2015-08-17 2015-08-31 2015-08-31
7: 345   control 2015-08-17 2015-09-01 2015-08-31
8: 456   control 2015-08-17 2015-08-19 2015-08-17
9: 456   control 2015-08-17 2015-08-27 2015-08-24

我试图输出这样的内容:

    cohort treatment visit_week_1 visit_week_2 visit_week_3
1: 2015-08-10   control      1      0      0
2: 2015-08-10  variable      1      1      0
3: 2015-08-17   control      2      1      1

我尝试使用dcast,但我的命令有问题,因为计数已关闭:

 > dcast(dt, cohort+treatment ~ paste0("visit_week_", dt[, seq_len(.N), by=id]$V1), value.var="visit_week", function(x) length(unique(x)))
      cohort treatment visit_week_1 visit_week_2 visit_week_3
1 2015-08-10   control            1            1            0
2 2015-08-10  variable            1            1            0
3 2015-08-17   control            1            2            1

补充说明:每个访问周需要与每个群组相关。因此,visit_week 1:3对于群组2015-08-10将是&#34; 2015-08-10&#34;,&#34; 2015-08-17&#34;,&#34; 2015- 08-24&#34 ;.对于队列2015-08-17周1:3将是&#34; 2015-08-17&#34;,&#34; 2015-08-24&#34;,&#34; 2015-08-31&#34 ;

1 个答案:

答案 0 :(得分:4)

您可以使用dplyrtidyr

library(dplyr)
library(tidyr)
dt %>% group_by(cohort, treatment, visit_week) %>%
       summarise(visits = n()) %>%
       mutate(week = paste0("visit_week_", as.numeric(as.factor(visit_week)))) %>%
       dplyr::select(-visit_week) %>%
       spread(week, visits, fill = 0)

Source: local data table [3 x 5]
Groups: 

      cohort treatment visit_week_1 visit_week_2 visit_week_3
       (chr)     (chr)        (dbl)        (dbl)        (dbl)
1 2015-08-10   control            2            0            0
2 2015-08-10  variable            1            1            0
3 2015-08-17   control            2            1            2