假设我的数据集如下:
library(tidyverse)
df_raw <- data.frame(id = paste0('id', sample(c(1:13), replace = TRUE)), startTime = as.Date(rbeta(13, 0.7, 10) * 100, origin = "2016-01-01"), Channel = paste0('c', sample(c(1:3), 13, replace = TRUE, prob = c(0.2, 0.12, 0.3))) ) %>%
group_by(id) %>%
mutate(totals_transactions = sample(c(0, 1), n(), prob = c(0.9, 0.1), replace = TRUE)) %>%
ungroup() %>%
arrange(id, startTime)
现在,我想将相同的ID汇总到一起,并在此新数据框中添加列,指示该ID是否使用了某个频道。我这样做了:
seq_summaries <- df_raw %>%
group_by(id) %>%
summarize(
c1_touches = max(ifelse(Channel == "c1",1,0)),
c2_touches = max(ifelse(Channel == "c2",1,0)),
c3_touches = max(ifelse(Channel == "c3",1,0)),
conversions = sum(totals_transactions)
) %>% ungroup()
但是,我正在寻找一种不必为每个频道手动创建列的方式,因为频道数量可能超过三个,这导致了很多工作。
答案 0 :(得分:2)
这是一个想法。请注意,您的数据框中没有任何c2
。要使用complete
功能,您仍需提供c
(c1
至c3
)的完整列表。
library(tidyverse)
df2 <- df_raw %>%
group_by(id, Channel) %>%
summarize(
touches = 1L,
conversions = as.integer(sum(totals_transactions))
) %>%
ungroup() %>%
complete(Channel = paste0("c", 1:3)) %>%
spread(Channel, touches, fill = 0L) %>%
drop_na(id) %>%
select(id, paste0("c", 1:3), conversions)
df2
# # A tibble: 8 x 5
# id c1 c2 c3 conversions
# <fct> <int> <int> <int> <int>
# 1 id10 1 0 0 0
# 2 id11 0 0 1 0
# 3 id12 0 0 1 1
# 4 id2 0 0 1 0
# 5 id3 0 0 1 0
# 6 id6 1 0 0 0
# 7 id8 1 0 0 1
# 8 id9 0 0 1 0