我有一个看起来像这样的数据集:
customer_id group_a group_b group_c group_d
123 true false true false
456 false true false true
789 false true true false
我在这样的数据集中也有每个客户的记录。
customer_id date
123 01/01/2019
123 01/02/2019
123 01/03/2019
123 01/04/2019
123 01/04/2019
456 01/01/2019
456 01/02/2019
456 01/03/2019
789 01/01/2019
789 01/03/2019
789 01/03/2019
我希望能够获得客户为“ true”的每个组迭代的按日期的唯一记录数,以及每个组的客户总数< / strong>。结果如下:
date group record total
01/01/2019 a 1 1
01/02/2019 a 1 1
01/03/2019 a 1 1
01/04/2019 a 1 1
01/01/2019 b 2 2
01/02/2019 b 1 2
01/03/2019 b 2 2
01/04/2019 b 0 2
01/01/2019 c 2 2
01/02/2019 c 1 2
01/03/2019 c 2 2
01/04/2019 c 1 2
01/01/2019 d 1 1
01/02/2019 d 1 1
01/03/2019 d 1 1
01/04/2019 d 0 1
答案 0 :(得分:1)
我觉得这不是很优雅,但是结果符合您的预期输出,所以:就在这里。
library(lubridate)
library(dplyr)
library(tidyr)
df2$date <- mdy(df2$date)
df2 %>%
inner_join(df1, by = "customer_id", copy = TRUE) %>%
gather(key = "group", value = "member", group_a:group_d) %>%
filter(member == "true") %>%
complete(date, group) %>%
select(date, group, customer_id) -> df3
df3 %>%
group_by(group, date) %>%
summarise(record = n_distinct(customer_id, na.rm = TRUE)) %>%
left_join( df3 %>%
group_by(group) %>%
summarise(total = n_distinct(customer_id, na.rm = TRUE)),
by = "group") %>% ungroup() %>%
select(date, group, record, total) -> result
给出:
# A tibble: 16 x 4
date group record total
<date> <chr> <int> <int>
1 2019-01-01 group_a 1 1
2 2019-01-02 group_a 1 1
3 2019-01-03 group_a 1 1
4 2019-01-04 group_a 1 1
5 2019-01-01 group_b 2 2
6 2019-01-02 group_b 1 2
7 2019-01-03 group_b 2 2
8 2019-01-04 group_b 0 2
9 2019-01-01 group_c 2 2
10 2019-01-02 group_c 1 2
11 2019-01-03 group_c 2 2
12 2019-01-04 group_c 1 2
13 2019-01-01 group_d 1 1
14 2019-01-02 group_d 1 1
15 2019-01-03 group_d 1 1
16 2019-01-04 group_d 0 1