How to round floats to integers while preserving their sum?具有以伪编码写的下面的answer,其将向量舍入为整数值,使得未改变的元素和舍入误差的总和最小化。我想在R中有效地实现这一点(如果可能的话,矢量化)。
例如,舍入这些数字会产生不同的总数:
set.seed(1)
(v <- 10 * runif(4))
# [1] 2.655087 3.721239 5.728534 9.082078
(v <- c(v, 25 - sum(v)))
# [1] 2.655087 3.721239 5.728534 9.082078 3.813063
sum(v)
# [1] 25
sum(round(v))
# [1] 26
从answer复制伪代码以供参考
// Temp array with same length as fn.
tempArr = Array(fn.length)
// Calculate the expected sum.
arraySum = sum(fn)
lowerSum = 0
-- Populate temp array.
for i = 1 to fn.lengthf
tempArr[i] = { result: floor(fn[i]), // Lower bound
difference: fn[i] - floor(fn[i]), // Roundoff error
index: i } // Original index
// Calculate the lower sum
lowerSum = lowerSum + tempArr[i] + lowerBound
end for
// Sort the temp array on the roundoff error
sort(tempArr, "difference")
// Now arraySum - lowerSum gives us the difference between sums of these
// arrays. tempArr is ordered in such a way that the numbers closest to the
// next one are at the top.
difference = arraySum - lowerSum
// Add 1 to those most likely to round up to the next number so that
// the difference is nullified.
for i = (tempArr.length - difference + 1) to tempArr.length
tempArr.result = tempArr.result + 1
end for
// Optionally sort the array based on the original index.
array(sort, "index")
答案 0 :(得分:15)
以更简单的形式,我会说这个算法是:
这可以通过R中的矢量化方式实现:
floor
order
)tail
获取具有k个最大小数部分的元素的索引,其中k是我们需要增加总和以达到目标值的数量在代码中:
smart.round <- function(x) {
y <- floor(x)
indices <- tail(order(x-y), round(sum(x)) - sum(y))
y[indices] <- y[indices] + 1
y
}
v
# [1] 2.655087 3.721239 5.728534 9.082078 3.813063
sum(v)
# [1] 25
smart.round(v)
# [1] 2 4 6 9 4
sum(smart.round(v))
# [1] 25
答案 1 :(得分:7)
感谢这个有用的功能!只是为了添加答案,如果舍入到指定的小数位数,则可以修改该函数:
smart.round <- function(x, digits = 0) {
up <- 10 ^ digits
x <- x * up
y <- floor(x)
indices <- tail(order(x-y), round(sum(x)) - sum(y))
y[indices] <- y[indices] + 1
y / up
}
答案 2 :(得分:2)
与@josliber的smartRound相比,运行基于总体和差异的方法要快得多:
diffRound <- function(x) {
diff(c(0, round(cumsum(x))))
}
以下是1m记录的结果比较(详见此处:Running Rounding):
res <- microbenchmark(
"diff(dww)" = x$diff.rounded <- diffRound(x$numbers) ,
"smart(josliber)"= x$smart.rounded <- smartRound(x$numbers),
times = 100
)
Unit: milliseconds
expr min lq mean median uq max neval
diff(dww) 38.79636 59.70858 100.6581 95.4304 128.226 240.3088 100
smart(josliber) 466.06067 719.22723 966.6007 1106.2781 1177.523 1439.9360 100