我基本上有这个数据,但是更大:
我想计算(customer_id,account_id)的许多不同组合-即基于两列的不同或唯一值,但对于每个start_date。我在任何地方都找不到解决方案。结果应该是添加到我的data.table中的另一列,看起来应该像这样:
也就是说,对于每个开始日期,它都会根据customer_id和account_id来计算不同值的数量。
例如,对于等于2.2.2018的start_date,我在(customer_id,account_id)中有不同的组合,分别是(4,22)(5,38)和(6,13),所以我希望计数等于3,因为我有3个不同的组合。我还需要该解决方案来处理customer_id和account_id列中的字符值。
用于复制数据的代码:
customer_id <- c(1,1,1,2,3,3,4,5,5,6)
account_id <- c(11,11,11,11,55,88,22,38,38,13)
start_date <- c(rep(as.Date("2017-01-01","%Y-%m-%d"),each=6),rep(as.Date("2018-02-02","%Y-%m-%d"),each=4))
data <- data.table(customer_id,account_id,start_date)
答案 0 :(得分:0)
dplyr选项
customer_id <- c(1,1,1,2,3,3,4,5,5,6)
account_id <- c(11,11,11,11,55,88,22,38,38,13)
start_date <- c(rep(as.Date("2017-01-01","%Y-%m-%d"),each=6),rep(as.Date("2018-02-
02","%Y-%m-%d"),each=4))
data <- data.frame(customer_id,account_id,start_date)
data %>%
group_by(start_date, customer_id, account_id) %>%
summarise(Total = 1) %>%
group_by(start_date) %>%
summarise(Count =n())
答案 1 :(得分:0)
另一个dplyr
选项:
library(dplyr)
customer_id <- c(1,1,1,2,3,3,4,5,5,6)
account_id <- c(11,11,11,11,55,88,22,38,38,13)
start_date <- c(rep(as.Date("2017-01-01","%Y-%m-%d"),each=6),rep(as.Date("2018-02-
02","%Y-%m-%d"),each=4))
data <- data.frame(customer_id,account_id,start_date)
data %>%
group_by(start_date)%>%
mutate(distinct_values = n_distinct(customer_id, account_id)) %>%
ungroup()
答案 2 :(得分:0)
这是一个data.table
选项
data[, N := uniqueN(paste0(customer_id, account_id, "_")), by = start_date]
# customer_id account_id start_date N
# 1: 1 11 2017-01-01 4
# 2: 1 11 2017-01-01 4
# 3: 1 11 2017-01-01 4
# 4: 2 11 2017-01-01 4
# 5: 3 55 2017-01-01 4
# 6: 3 88 2017-01-01 4
# 7: 4 22 2018-02-02 3
# 8: 5 38 2018-02-02 3
# 9: 5 38 2018-02-02 3
#10: 6 13 2018-02-02 3
或
data[, N := uniqueN(.SD, by = c("customer_id", "account_id")), by = start_date]