假设我有一个包含序列号的容器及其各自体积的数据集。
x <- data.frame("SN" = 1:3, "Price" = c(10,20,30), "Volume" = c(100,150,200))
SN Price Volume
1 10 100
2 20 150
3 30 200
我正在寻找使用给定尺寸的水桶填充容器的方法。
bucket_size = 200的所需输出:
SN Price Volume
1 10 100 # max for SN 1 is 100, totally filled, bucket now = 100
2 20 100 # max for SN 2 is 150, bucket now = 0
2 20 50 # fill remaining SN 2, new bucket now = 150
3 30 150 # max for SN 3 is 200, bucket now = 0
3 30 50 # fill remaining in SN 3, bucket now = 150 remaining
我已经开始编码,但看来我的代码不够通用,无法适合任何存储桶大小。
x <- data.frame("SN" = 1:3, "Price" = c(10,20,30), "Volume" = c(100,150,200))
bucketsize <- 200
PendingBucketVolume <- bucketsize
y <- data.frame(SN = integer(),Price=numeric(),Volume=numeric(),stringsAsFactors=FALSE)
for (i in 1:nrow(x)) {
if (x$Volume[i] <= PendingBucketVolume) {
print(x$Volume[i])
PendingBucketVolume <- PendingBucketVolume - x$Volume[i]
} else {
print(PendingBucketVolume)
remainder <- x$Volume[i] - PendingBucketVolume
if (remainder <= bucketsize) {
print(remainder)
} else {
print(bucketsize)
remainder <- remainder - bucketsize
}
if (remainder < PendingBucketVolume) {
PendingBucketVolume <- remainder
} else {
PendingBucketVolume <- bucketsize
PendingBucketVolume <- PendingBucketVolume - remainder
}
}
}
建议使其通用且高效。
答案 0 :(得分:1)
我花了很长时间试图获取if else
逻辑来解决这个问题。行卷和存储桶卷之间的平衡过多。相反,我认为我可以将所有卷分开并给它们分配一个ID,cbind
,然后使用table将它们重新组合在一起。结果可能是计算的速度比if else
方法慢得多,但编码非常简单。
x <- data.frame("SN" = 1:3, "Price" = c(10,20,30), "Volume" = c(100,150,200))
allocate_buckets <- function(x, bucketsize){
# assumption that X has the colnames
stopifnot(colnames(x) == c("SN","Price","Volume"))
row_num <- rep(x[,"SN"], x[,"Volume"])
l <- length(row_num)
bucket_num <- rep(1:ceiling(l/bucketsize), each = bucketsize)[1:l]
out <- table(row_num, bucket_num)
out.ind <- which(out !=0, arr.ind = T)
return(cbind.data.frame(x[out.ind[,1],c("SN","Price")], Volume = out[out.ind]))
}
现在,您可以将其用于任何(整数)卷:
allocate_buckets(x, 200)
# SN Price Volume
#1 1 10 100
#2 2 20 100
#2.1 2 20 50
#3 3 30 150
#3.1 3 30 50
allocate_buckets(x, 67)
# SN Price Volume
#1 1 10 67
#1.1 1 10 33
#2 2 20 34
#2.1 2 20 67
#2.2 2 20 49
#3 3 30 18
#3.1 3 30 67
#3.2 3 30 67
#3.3 3 30 48
编辑
您发布的惊人链接,我已经非常接近了,这是R版本:
x <- data.frame("SN" = 1:3, "Price" = c(10,20,30), "Volume" = c(100,150,200))
y <- data.frame(SN = integer(), Price = numeric(), Volume = numeric())
bucket <- bucketsize <- 200
vol <- numeric()
count <- 0
for(i in 1:nrow(x)){
volume <- x[i,"Volume"]
while(volume!=0){
vol <- min(volume, bucket)
print(vol)
count <- count + 1
y[count,] <- x[i,]
y[count,"Volume"] <- vol
volume <- volume - vol
bucket <- bucket - vol
if(bucket == 0){
bucket <- bucketsize
}
}
}
编辑2 我对这两种方法进行了微基准测试(花了一段时间),结果是,与从SAS转录的代码相比,我的原始方法实际上看起来更快。
expr min lq mean median uq max neval
allocate_buckets(x, 200) 312.4177 466.6347 504.2121 483.1754 516.2977 846.4529 100
other(x, 200) 986.6495 1233.5141 1339.4219 1265.3606 1389.1158 2023.7884 100
这对我来说是意外的。另一种方法的好处是它可以处理非整数值。可以通过使用data.tables来加快allocate_buckets函数的速度,并且可以通过乘以100或使最小的小数成为整数的任何数字来解除非整数约束,然后将结果除以100。