我在某些星期内有推广产品的数据集,我想计算过去4个非促销(由旗帜表示)周促销期间产品的平均销售额。如果交易是在非促销期间,我们应该进行销售,因为我们必须采取最近非促销周销售的平均销售,它们可能是不连续的。
请注意
structure(list(Product_group = structure(c(1L, 1L, 1L, 1L, 1L,
1L), .Label = "A", class = "factor"), Promo = structure(c(1L,
1L, 2L, 1L, 1L, 2L), .Label = c("0", "1"), class = "factor"),
Week = structure(c(1L, 2L, 2L, 3L, 4L, 5L), .Label = c("2017-01-01",
"2017-01-02", "2017-01-04", "2017-01-05", "2017-01-06", "2017-01-08",
"2017-01-09"), class = "factor"), Sales = c(50, 50, 60, 70,
50, 50)), .Names = c("Product_group", "Promo", "Week", "Sales"
), row.names = c(NA, 6L), class = "data.frame")
head(df)
Product_group Promo Week Sales
1 A 0 2017-01-01 50
2 A 0 2017-01-02 50
3 A 1 2017-01-02 60
4 A 0 2017-01-04 70
5 A 0 2017-01-05 50
6 A 1 2017-01-06 50
我正在寻找像
这样的输出 Product_group Promo Week Sales Avg Pre Promo Sales
1 A 0 2017-01-01 50 50 # Since it is non promo
2 A 0 2017-01-02 50 50
3 A 1 2017-01-02 60 50 # 100/2
4 A 0 2017-01-04 70 70
5 A 0 2017-01-05 50 50
6 A 1 2017-01-06 50 55 # (50 +70 + 50 + 50 )/4
答案 0 :(得分:1)
当Promo == 1
时,我会查看Promo
为零的索引。然后选择最后四周的最大值来获得平均值。
df <- rbind(df, df) # get more rows to data
df$AvgPrePromoSales <-
sapply(1 : nrow(df), function(x) if(df$Promo[x] == 1) {
ind <- which(df[1:x,]$Promo == 0)
mean(df$Sales[ind[max(1, length(ind) - 3) : length(ind)]])
} else {
df$Sales[x]
})
df[, c(2, 4, 5)]
# Promo Sales AvgPrePromoSales
# 1 0 50 50
# 2 0 50 50
# 3 1 60 50
# 4 0 70 70
# 5 0 50 50
# 6 1 50 55
# 7 0 50 50
# 8 0 50 50
# 9 1 60 55
# 10 0 70 70
# 11 0 50 50
# 12 1 50 55