我有按客户ID分组并按购买日期排序的客户数据。我想添加一列来累计计算迄今为止已订购的不同产品的数量-即:
Input <- data.frame(Customer = c("C-01", "C-01", "C-02", "C-02", "C-02", "C-02", "C-03", "C-03", "C-03", "C-03"),
Product = c("COKE", "COKE", "FRIES", "SHAKE", "BURGER", "BURGER", "CHICKEN", "FISH", "FISH", "FISH"),
Date = c("2018-01-02","2018-01-05","2018-01-03","2018-01-06","2018-01-08","2018-01-12","2018-01-02","2018-01-04", "2018-01-16", "2018-01-20"))
Output <- data.frame(Customer = c("C-01", "C-01", "C-02", "C-02", "C-02", "C-02", "C-03", "C-03", "C-03", "C-03"),
Product = c("COKE", "COKE", "FRIES", "SHAKE", "BURGER", "BURGER", "CHICKEN", "FISH", "FISH", "FISH"),
Date = c("2018-01-02","2018-01-05","2018-01-03","2018-01-06","2018-01-08","2018-01-12","2018-01-02","2018-01-04", "2018-01-16", "2018-01-20"),
Cum_Distinct = c(1, 1, 1, 2, 3, 3, 1, 2, 2, 2))
设置了输入数据后,我想使用dplyr创建输出数据。我如何保持迄今为止遇到的不同产品的累计计数?
答案 0 :(得分:1)
我们可以按组取非duplicated
值的累积总和。
library(dplyr)
Input %>%
group_by(Customer) %>%
mutate(Cum_Distinct = cumsum(!duplicated(Product)))
# Customer Product Date Cum_Distinct
# <fct> <fct> <fct> <int>
# 1 C-01 COKE 2018-01-02 1
# 2 C-01 COKE 2018-01-05 1
# 3 C-02 FRIES 2018-01-03 1
# 4 C-02 SHAKE 2018-01-06 2
# 5 C-02 BURGER 2018-01-08 3
# 6 C-02 BURGER 2018-01-12 3
# 7 C-03 CHICKEN 2018-01-02 1
# 8 C-03 FISH 2018-01-04 2
# 9 C-03 FISH 2018-01-16 2
#10 C-03 FISH 2018-01-20 2
答案 1 :(得分:0)
我们可以使用match
来获取“产品”中unique
个元素的索引
library(dplyr)
Input %>%
group_by(Customer) %>%
mutate(Cum_Distinct = match(Product, unique(Product)))
# A tibble: 10 x 4
# Groups: Customer [3]
# Customer Product Date Cum_Distinct
# <fct> <fct> <fct> <int>
# 1 C-01 COKE 2018-01-02 1
# 2 C-01 COKE 2018-01-05 1
# 3 C-02 FRIES 2018-01-03 1
# 4 C-02 SHAKE 2018-01-06 2
# 5 C-02 BURGER 2018-01-08 3
# 6 C-02 BURGER 2018-01-12 3
# 7 C-03 CHICKEN 2018-01-02 1
# 8 C-03 FISH 2018-01-04 2
# 9 C-03 FISH 2018-01-16 2
#10 C-03 FISH 2018-01-20 2
或使用group_indices
library(tidyverse)
Input %>%
group_by(Customer) %>%
nest %>%
mutate(data = map(data, ~ .x %>%
mutate(Cum_Distinct = group_indices(., Product)))) %>%
或使用base R
Input$Cum_Distinct <- with(Input, as.integer(ave(as.character(Product), Customer,
FUN = function(x) match(x, unique(x)))))