我有多个与日常销售相关的产品。我想根据每种产品的累计销售额和我期望在一段时间内销售的总销售额来预测这些产品的预期日销售额。
第一个表格(“关键字”)具有每个产品的预期总销售额,以及我预测每天销售的数量(根据已售出的数量)(即,我的产品A的累计销售额是650,我已售出1500总额的43%,因此预计第二天会卖出75,因为40%<= 43%<60%。
我想根据预测的销售量更新每个产品的第二个表(“数据”)累计销售额。预测的数量取决于前一期间的累计销售额,这意味着我无法独立计算每一列,因此我认为我需要使用循环。
然而,我的数据库有超过500,000行,我使用for循环的最佳尝试是太慢而不可行。思考?我认为Rcpp实现可能是一个潜在的解决方案,但我之前没有使用过该包或C ++。所需的最终答案如下所示(“最终”)。
library(data.table)
key <- data.table(Product = c(rep("A",5), rep("B",5)), TotalSales =
c(rep(1500,5),rep(750,5)), Percent = rep(seq(0.2, 1, 0.2),2), Forecast =
c(seq(125, 25, -25), seq(75, 15, -15)))
data <- data.table(Date = rep(seq(1, 9, 1), 2), Product=rep(c("A", "B"),
each=9L), Time = rep(c(rep("Past",4), rep("Future",5)),2), Sales = c(190,
165, 133, 120, 0, 0, 0, 0, 0, 72, 58, 63, 51, 0, 0, 0, 0, 0))
final <- data.table(data, Cum = c(190, 355, 488, 608, 683, 758, 833, 908,
958, 72, 130, 193, 244, 304, 349, 394, 439, 484), Percent.Actual = c(0.13,
0.24, 0.33, 0.41, 0.46, 0.51, 0.56, 0.61, 0.64, 0.10, 0.17, 0.26, 0.33,
0.41, 0.47, 0.53, 0.59, 0.65), Forecast = c(0, 0, 0, 0, 75, 75, 75, 75, 50,
0, 0, 0, 0, 60, 45, 45, 45, 45))
答案 0 :(得分:1)
Not sure if this is really going to help with your actual dataset given the size.
library(data.table)
#convert key into a list for fast loookup
keyLs <- lapply(split(key, by="Product"),
function(x) list(TotalSales=x[,TotalSales[1L]],
Percent=x[,Percent],
Forecast=x[,Forecast]))
#for each product, use recursion to calculate cumulative sales after finding the forecasted sales
futureSales <- data[, {
byChar <- as.character(.BY)
list(Date=Date[Time=="Future"],
Cum=Reduce(function(x, y) {
pct <- x / keyLs[[byChar]]$TotalSales
res <- x + keyLs[[byChar]]$Forecast[findInterval(pct, c(0, keyLs[[byChar]]$Percent))]
if (res >= keyLs[[byChar]]$TotalSales) return(keyLs[[byChar]]$TotalSales)
res
},
x=rep(0L, sum(Time=="Future")),
init=sum(Sales[Time=="Past"]),
accumulate=TRUE)[-1])
},
by=.(Product)]
futureSales
#calculate other sales stats
futureSales[data, on=.(Date, Product)][,
Cum := ifelse(is.na(Cum), cumsum(Sales), Cum),
by=.(Product)][,
':=' (
Percent.Actual = Cum / keyLs[[as.character(.BY)]]$TotalSales,
Forecast = ifelse(Sales > 0, 0, c(0, diff(Cum)))
), by=.(Product)][]
# Product Date Cum Time Sales Percent.Actual Forecast
# 1: A 1 190 Past 190 0.1266667 0
# 2: A 2 355 Past 165 0.2366667 0
# 3: A 3 488 Past 133 0.3253333 0
# 4: A 4 608 Past 120 0.4053333 0
# 5: A 5 683 Future 0 0.4553333 75
# 6: A 6 758 Future 0 0.5053333 75
# 7: A 7 833 Future 0 0.5553333 75
# 8: A 8 908 Future 0 0.6053333 75
# 9: A 9 958 Future 0 0.6386667 50
# 10: B 1 72 Past 72 0.0960000 0
# 11: B 2 130 Past 58 0.1733333 0
# 12: B 3 193 Past 63 0.2573333 0
# 13: B 4 244 Past 51 0.3253333 0
# 14: B 5 304 Future 0 0.4053333 60
# 15: B 6 349 Future 0 0.4653333 45
# 16: B 7 394 Future 0 0.5253333 45
# 17: B 8 439 Future 0 0.5853333 45
# 18: B 9 484 Future 0 0.6453333 45
You might also want to consider running your calculation in parallel by product.