我有以下数据:
Date_X <- c('2020-01-01','2020-01-02','2020-01-03','2020-04-01','2020-05-01','2020-06-01')
Showroom <- c('A','A','A','A','A','A')
Item <- c('z1','z2','z3','z2','z2','z3')
Customer <- c('c1','c2','c3','c4','c5','c6')
Quantity <- c(12,23,34,22,11,234)
df <- data.frame(Date_X,Showroom,Item,Customer,Quantity)
我想查看过去 6 天内在特定陈列室带来特定商品的客户总数。例如;过去 6 天内从陈列室“A”购买商品“z2”的客户总数为 3,平均数量为 (23+22+11)/3。它还应该具有陈列室项目级别的最新购买日期。 [注:实际数据有100+陈列室和1000+件商品]
想要的输出是这样的:
答案 0 :(得分:1)
这行得通吗:
library(dplyr)
df %>% group_by(Showroom, Item) %>% summarise(Total_Customers = n(), Quantity = mean(Quantity)) %>%
left_join(df %>% group_by(Showroom, Item) %>% filter(Date_X == max(Date_X)), by = c('Showroom', 'Item')) %>%
select(Showroom, Item, Total_Customers, 'Last_Purchase_Date' = Date_X, 'Quantity' = Quantity.x)
`summarise()` regrouping output by 'Showroom' (override with `.groups` argument)
# A tibble: 3 x 5
# Groups: Showroom [1]
Showroom Item Total_Customers Last_Purchase_Date Quantity
<chr> <chr> <int> <chr> <dbl>
1 A z1 1 2020-01-01 12
2 A z2 3 2020-05-01 18.7
3 A z3 2 2020-06-01 134
答案 1 :(得分:1)
带有 data.table
的一个选项是按 'Showroom'、'Item' 分组,通过获取 'Total_Customers' 的行数 (.N
) 进行汇总,last
的值是 ' Date_X'(假设订购了 'Date_X')和 mean
的 'Quantity'
library(data.table)
setDT(df)[, .(Total_Customers = .N,
Last_Purchase_Date = last(Date_X),
Quantity = mean(Quantity)), by = .(Showroom, Item)]
# Showroom Item Total_Customers Last_Purchase_Date Quantity
#1: A z1 1 2020-01-01 12.00000
#2: A z2 3 2020-05-01 18.66667
#3: A z3 2 2020-06-01 134.00000
答案 2 :(得分:1)
这是使用 transform
+ aggregate
transform(
aggregate(. ~ Showroom + Item, df, c),
Date_X = sapply(Date_X, tail, 1),
Customer = lengths(Customer),
Quantity = sapply(Quantity, function(x) mean(as.numeric(x)))
)
给出
Showroom Item Date_X Customer Quantity
1 A z1 2020-01-01 1 12.00000
2 A z2 2020-05-01 3 18.66667
3 A z3 2020-06-01 2 134.00000