在R中的组中查找子组摘要

时间:2018-02-22 11:30:19

标签: r dplyr grouping summary

PPG Product Week        Sales
 P1  A      01/01/2018  50
 P1  B      01/01/2018  40
 P1  B      01/02/2018  30
 P1  A      01/02/2018  80
 P2  A      01/01/2018  100
 P2  B      01/02/2018  70

我试图找到每个PPG的总结,在这里和每个PPG中我想要获得最高销售额(整体)的产品,如下所示,

PPG   Max Product Sales
 P1      130 (This is sum of product A for ppg p1 across weeks)
 P2      100 (This is sum of product A for ppg p2 across weeks)

我已尝试在dplyr中使用top_n(1,sum(sales))来实现,但它失败了,我们怎么能解决这个问题呢?我们可以将它扩展到几周内按销售额找到前n个产品,以便检查如果80-20规则,欢迎任何想法。

2 个答案:

答案 0 :(得分:5)

这是使用local.additional的解决方案:

dlpyr

首先,按PPG和产品对数据进行分组,按组分类销售,然后按PPG分组,只取最大值:

library(dplyr)

输出:

my_data %>% 
  group_by(PPG, Product) %>% 
  summarise("Max Product Sales" = sum(Sales)) %>% 
  group_by(PPG) %>% 
  summarise("Max Product Sales" = max(`Max Product Sales`))

# A tibble: 2 x 2 PPG `Max Product Sales` <chr> <dbl> 1 P1 130 2 P2 100

data.table

返回:

library(data.table)
setDT(my_data)

my_data[, .(`Max Product Sales` = sum(Sales)), by = .(PPG, Product)][, .(`Max Product Sales` = max(`Max Product Sales`)), by = PPG]

答案 1 :(得分:3)

您没有提供任何可重现的数据,所以让我们将您的文本读入df。

Node.js

我们使用df <- read.table(text= "PPG Product Week Sales P1 A 01/01/2018 50 P1 B 01/01/2018 40 P1 B 01/02/2018 30 P1 A 01/02/2018 80 P2 A 01/01/2018 100 P2 B 01/02/2018 70",header=T) 来获取PPG x Product组内的销售额。

data.table

结果是:

data.table::setDT(df)[,.(maxSales=sum(Sales)),by=c("PPG","Product")]

编辑:

   PPG Product maxSales
1:  P1       A      130
2:  P1       B       70
3:  P2       A      100
4:  P2       B       70