Question

假设我有一个包含序列号的容器及其各自体积的数据集。

x <- data.frame("SN" = 1:3, "Price" = c(10,20,30), "Volume" = c(100,150,200))

SN     Price      Volume
1      10         100
2      20         150
3      30         200

我正在寻找使用给定尺寸的水桶填充容器的方法。

如果在清空水桶之前已将容器装满，我想移至下一个SN。
如果在容器之前将桶清空填充完毕，我想用剩余的容器开始新的一行。

bucket_size = 200的所需输出：

 SN     Price      Volume
    1      10         100 # max for SN 1 is 100, totally filled, bucket now = 100
    2      20         100 # max for SN 2 is 150, bucket now = 0 
    2      20          50 # fill remaining SN 2, new bucket now = 150 
    3      30         150 # max for SN 3 is 200, bucket now = 0
    3      30          50 # fill remaining in SN 3, bucket now = 150 remaining

我已经开始编码，但看来我的代码不够通用，无法适合任何存储桶大小。

x <- data.frame("SN" = 1:3, "Price" = c(10,20,30), "Volume" = c(100,150,200))

bucketsize <- 200
PendingBucketVolume <- bucketsize

y <- data.frame(SN = integer(),Price=numeric(),Volume=numeric(),stringsAsFactors=FALSE)

for (i in 1:nrow(x)) {
  if (x$Volume[i] <= PendingBucketVolume) {
    print(x$Volume[i])
    PendingBucketVolume <- PendingBucketVolume - x$Volume[i]
  } else {
    print(PendingBucketVolume)
    remainder <- x$Volume[i] - PendingBucketVolume
    if (remainder <= bucketsize) {
      print(remainder)
    } else {
      print(bucketsize)
      remainder <- remainder - bucketsize

    }

    if (remainder < PendingBucketVolume) {
      PendingBucketVolume <- remainder
    } else {
      PendingBucketVolume <- bucketsize
      PendingBucketVolume <- PendingBucketVolume - remainder
    }

  }
}

建议使其通用且高效。

Answer 1

我花了很长时间试图获取if else逻辑来解决这个问题。行卷和存储桶卷之间的平衡过多。相反，我认为我可以将所有卷分开并给它们分配一个ID，cbind，然后使用table将它们重新组合在一起。结果可能是计算的速度比if else方法慢得多，但编码非常简单。

x <- data.frame("SN" = 1:3, "Price" = c(10,20,30), "Volume" = c(100,150,200))

allocate_buckets <- function(x, bucketsize){
  # assumption that X has the colnames
  stopifnot(colnames(x) == c("SN","Price","Volume"))
  row_num <- rep(x[,"SN"], x[,"Volume"])
  l <- length(row_num)
  bucket_num <- rep(1:ceiling(l/bucketsize), each = bucketsize)[1:l]
  out <- table(row_num, bucket_num)
  out.ind <- which(out !=0, arr.ind = T)
  return(cbind.data.frame(x[out.ind[,1],c("SN","Price")], Volume = out[out.ind]))
}

现在，您可以将其用于任何（整数）卷：

allocate_buckets(x, 200)
#    SN Price Volume
#1    1    10    100
#2    2    20    100
#2.1  2    20     50
#3    3    30    150
#3.1  3    30     50

allocate_buckets(x, 67)
#    SN Price Volume
#1    1    10     67
#1.1  1    10     33
#2    2    20     34
#2.1  2    20     67
#2.2  2    20     49
#3    3    30     18
#3.1  3    30     67
#3.2  3    30     67
#3.3  3    30     48

编辑

您发布的惊人链接，我已经非常接近了，这是R版本：

x <- data.frame("SN" = 1:3, "Price" = c(10,20,30), "Volume" = c(100,150,200))
y <- data.frame(SN = integer(), Price = numeric(), Volume = numeric())
bucket <- bucketsize <- 200
vol <- numeric()
count <- 0
for(i in 1:nrow(x)){
  volume <- x[i,"Volume"]
  while(volume!=0){
    vol <- min(volume, bucket)
    print(vol)
    count <- count + 1
    y[count,] <- x[i,]
    y[count,"Volume"] <- vol
    volume <- volume - vol
    bucket <- bucket - vol
    if(bucket == 0){
      bucket <- bucketsize
    }
  }
}

编辑2 我对这两种方法进行了微基准测试（花了一段时间），结果是，与从SAS转录的代码相比，我的原始方法实际上看起来更快。

                     expr      min        lq      mean    median        uq       max neval
 allocate_buckets(x, 200) 312.4177  466.6347  504.2121  483.1754  516.2977  846.4529   100
            other(x, 200) 986.6495 1233.5141 1339.4219 1265.3606 1389.1158 2023.7884   100

这对我来说是意外的。另一种方法的好处是它可以处理非整数值。可以通过使用data.tables来加快allocate_buckets函数的速度，并且可以通过乘以100或使最小的小数成为整数的任何数字来解除非整数约束，然后将结果除以100。

在给定铲斗尺寸的情况下按体积填充容器

1 个答案: