我正在从事一个项目,该项目需要我对大型数据集(+ 2000万行)进行库存模拟。
为此,我做了以下功能:
inventory_calculation <- function(sale, order, dates, expiry_period,
start_inventory, output = c("inv", "waste")) {
sales <- c(sale)
orders <- c(order)
dates <- c(dates)
inv <- c(rep(NA_real_, length(sales)))
waste <- c(rep(NA_real_, length(sales)))
deteriorated <- c()
goods <- c()
goods <- c(goods, rep(dates[1], start_inventory))
remaining_shelf_life <- c()
for( i in seq_along(sales)) {
# Save Current Date
current_date <- dates[i]
if( orders[i] != 0 ) {
# If there are any orders assign products to goods
goods <- c(goods, rep(dates[i], orders[i]))
}
# Calculate duration between current date and replenishment date
remaining_shelf_life <- as.numeric(current_date) - goods
# If any duration is larger than expiry date then remove products from inventory and add to waste bin.
if(any(remaining_shelf_life > expiry_period)) {
deteriorated <- remaining_shelf_life > expiry_period
goods <- goods[deteriorated == F]
waste[i] <- sum(deteriorated == T)
}
# If sales is > 0 then subtract sales from goods
if(sales[i] > 0) {
goods <- goods[-(1:sales[i])]
}
# Save sum of goods in inventory
inv[i] <- ifelse(!is.null(length(goods)), length(goods), 0L)
}
inv <- as.integer(inv)
waste <- as.integer(waste)
if(output == "inv") return(inv) else return(waste)
}
该函数可以正常运行,但是它仍然很慢(700,000行约12秒)。
因此,我希望你们中的一些人对提高速度有任何想法。
任何帮助将不胜感激。
以下是一些测试数据:
library(lubridate)
dates <- seq(dmy("01-01-2018"), dmy("01-09-2018"), "days")
sales <- rpois(length(dates), 10)
orders <- ceiling(rnorm(length(dates), 30, 30))
orders[orders < 0] <- 0L
output <- inventory_calculation(sale = sales,
order = orders,
dates = dates,
expiry_period = 2,
start_inventory = 1,
output = "inv")