PPG Product Week Sales
P1 A 01/01/2018 50
P1 B 01/01/2018 40
P1 B 01/02/2018 30
P1 A 01/02/2018 80
P2 A 01/01/2018 100
P2 B 01/02/2018 70
我试图找到每个PPG的总结,在这里和每个PPG中我想要获得最高销售额(整体)的产品,如下所示,
PPG Max Product Sales
P1 130 (This is sum of product A for ppg p1 across weeks)
P2 100 (This is sum of product A for ppg p2 across weeks)
我已尝试在dplyr中使用top_n(1,sum(sales))来实现,但它失败了,我们怎么能解决这个问题呢?我们可以将它扩展到几周内按销售额找到前n个产品,以便检查如果80-20规则,欢迎任何想法。
答案 0 :(得分:5)
这是使用local.additional
的解决方案:
dlpyr
首先,按PPG和产品对数据进行分组,按组分类销售,然后按PPG分组,只取最大值:
library(dplyr)
输出:
my_data %>%
group_by(PPG, Product) %>%
summarise("Max Product Sales" = sum(Sales)) %>%
group_by(PPG) %>%
summarise("Max Product Sales" = max(`Max Product Sales`))
# A tibble: 2 x 2
PPG `Max Product Sales`
<chr> <dbl>
1 P1 130
2 P2 100
:
data.table
返回:
library(data.table)
setDT(my_data)
my_data[, .(`Max Product Sales` = sum(Sales)), by = .(PPG, Product)][, .(`Max Product Sales` = max(`Max Product Sales`)), by = PPG]
答案 1 :(得分:3)
您没有提供任何可重现的数据,所以让我们将您的文本读入df。
Node.js
我们使用df <- read.table(text=
"PPG Product Week Sales
P1 A 01/01/2018 50
P1 B 01/01/2018 40
P1 B 01/02/2018 30
P1 A 01/02/2018 80
P2 A 01/01/2018 100
P2 B 01/02/2018 70",header=T)
来获取PPG x Product组内的销售额。
data.table
结果是:
data.table::setDT(df)[,.(maxSales=sum(Sales)),by=c("PPG","Product")]
编辑:
PPG Product maxSales
1: P1 A 130
2: P1 B 70
3: P2 A 100
4: P2 B 70