Question

我有以下数据：

Date_X <- c('2020-01-01','2020-01-02','2020-01-03','2020-04-01','2020-05-01','2020-06-01')
Showroom <- c('A','A','A','A','A','A')
Item <- c('z1','z2','z3','z2','z2','z3')
Customer <- c('c1','c2','c3','c4','c5','c6')
Quantity <- c(12,23,34,22,11,234)
df <- data.frame(Date_X,Showroom,Item,Customer,Quantity)

我想查看过去 6 天内在特定陈列室带来特定商品的客户总数。例如;过去 6 天内从陈列室“A”购买商品“z2”的客户总数为 3，平均数量为 (23+22+11)/3。它还应该具有陈列室项目级别的最新购买日期。 [注：实际数据有100+陈列室和1000+件商品]

想要的输出是这样的：

Answer 1

这行得通吗：

library(dplyr)
df %>% group_by(Showroom, Item) %>% summarise(Total_Customers = n(), Quantity = mean(Quantity)) %>% 
   left_join(df %>% group_by(Showroom, Item) %>% filter(Date_X == max(Date_X)), by = c('Showroom', 'Item')) %>% 
       select(Showroom, Item, Total_Customers, 'Last_Purchase_Date' = Date_X, 'Quantity' = Quantity.x)
`summarise()` regrouping output by 'Showroom' (override with `.groups` argument)
# A tibble: 3 x 5
# Groups:   Showroom [1]
  Showroom Item  Total_Customers Last_Purchase_Date Quantity
  <chr>    <chr>           <int> <chr>                 <dbl>
1 A        z1                  1 2020-01-01             12  
2 A        z2                  3 2020-05-01             18.7
3 A        z3                  2 2020-06-01            134

Answer 2

带有 data.table 的一个选项是按 'Showroom'、'Item' 分组，通过获取 'Total_Customers' 的行数 (.N) 进行汇总，last 的值是 ' Date_X'（假设订购了 'Date_X'）和 mean 的 'Quantity'

library(data.table)
setDT(df)[, .(Total_Customers = .N, 
      Last_Purchase_Date = last(Date_X),
      Quantity = mean(Quantity)), by =  .(Showroom, Item)]
#   Showroom Item Total_Customers Last_Purchase_Date  Quantity
#1:        A   z1               1         2020-01-01  12.00000
#2:        A   z2               3         2020-05-01  18.66667
#3:        A   z3               2         2020-06-01 134.00000

Answer 3

这是使用 transform + aggregate

的基本 R 选项

transform(
  aggregate(. ~ Showroom + Item, df, c),
  Date_X = sapply(Date_X, tail, 1),
  Customer = lengths(Customer),
  Quantity = sapply(Quantity, function(x) mean(as.numeric(x)))
)

给出

  Showroom Item     Date_X Customer  Quantity
1        A   z1 2020-01-01        1  12.00000
2        A   z2 2020-05-01        3  18.66667
3        A   z3 2020-06-01        2 134.00000

获取 R 中客户数量的最新日期

3 个答案: