Question

我正在从事一个项目，该项目需要我对大型数据集（+ 2000万行）进行库存模拟。

为此，我做了以下功能：

inventory_calculation <- function(sale, order, dates, expiry_period, 
  start_inventory, output = c("inv", "waste")) {
  sales <- c(sale)
  orders <- c(order)
  dates <- c(dates)

    inv <- c(rep(NA_real_, length(sales)))
    waste <- c(rep(NA_real_, length(sales)))
    deteriorated <- c()
    goods <- c()
    goods <- c(goods, rep(dates[1], start_inventory))
    remaining_shelf_life <- c()

    for( i in seq_along(sales)) {

      # Save Current Date
      current_date <- dates[i]

      if( orders[i] != 0 ) { 
      # If there are any orders assign products to goods
      goods <- c(goods, rep(dates[i], orders[i]))
      }

      # Calculate duration between current date and replenishment date
      remaining_shelf_life <- as.numeric(current_date) - goods

      # If any duration is larger than expiry date then remove products from             inventory and add to waste bin.
      if(any(remaining_shelf_life > expiry_period)) {
      deteriorated <- remaining_shelf_life > expiry_period
      goods <- goods[deteriorated == F]
      waste[i] <- sum(deteriorated == T)
      }

      # If sales is > 0 then subtract sales from goods
      if(sales[i] > 0) {
        goods <- goods[-(1:sales[i])]
      }

      # Save sum of goods in inventory
      inv[i] <- ifelse(!is.null(length(goods)), length(goods), 0L)

    }

    inv <- as.integer(inv)
    waste <- as.integer(waste)

    if(output == "inv") return(inv) else return(waste)
  }

该函数可以正常运行，但是它仍然很慢（700,000行约12秒）。

因此，我希望你们中的一些人对提高速度有任何想法。

任何帮助将不胜感激。

以下是一些测试数据：

library(lubridate)
dates <- seq(dmy("01-01-2018"), dmy("01-09-2018"), "days")
sales <- rpois(length(dates), 10)
orders <- ceiling(rnorm(length(dates), 30, 30))
orders[orders < 0] <- 0L

output <- inventory_calculation(sale = sales, 
                                order = orders, 
                                dates = dates, 
                                expiry_period = 2, 
                                start_inventory = 1, 
                                output = "inv")

加速库存模拟功能

0 个答案: