Question

我有170行带小数的数字需要舍入到整数。但是，行总数必须等于我指定的数字。

作为一个非常基本的例子，假设我有一个带有单元格内容的矩阵（1x4）（1.2,3.4,7.7,5.3）。但是，让我们说这些数字代表个体，所以我需要将它们四舍五入到整数，这样群体人口就等于18个人的总人口。如果我简单地舍入矩阵内容，这给了我（1,3,8,5），我的总人口是17，我需要它等于18（见下面的R命令）。

m＆lt; -c（1.2,3.4,7.7,5.3）

m.2 < - round（m）

M.2   [1] 1 3 8 5

和（M.2）   [1] 17

数字四舍五入后，我需要R然后选择最接近四舍五入的下一个数字（即3.4）并将其四舍五入为4而不是3。

这会给我一个（1,4,8,5）= 18的矩阵。

博士。约翰福克斯用一个简单的递归函数帮我解决了这个问题：

Round <- function(x, target){
 r.x <- round(x)
 diff.x <- round(x) - x
 if ((s <- sum(r.x)) == target) return(r.x)
 else if (s > target) {
     select <- seq(along=x)[diff.x > 0]
     which <- which.max(diff.x[select])
     x[select[which]] <- r.x[select[which]] - 1
     Round(x, target)
 }
 else{
     select <- seq(along=x)[diff.x < 0]
     which <- which.min(diff.x[select])
     x[select[which]] <- r.x[select[which]] + 1
     Round(x, target)
  }
 }

这对单个行非常有用。但我的数据集中有170行。这意味着重复这样的过程（见下文）170次：

paste(STATA[['b']], collapse=", ")

B <- c(46.8310012817383, 19.9720001220703, 265.837005615234, 95.0400009155273, 6.88700008392334, 190.768997192383, 22.7269992828369, 764.453002929688, 53.0299987792969, 333.329010009766, 55.0960006713867, 84.0210037231445, 28.2369995117188, 2207.27099609375, 86.7760009765625, 50045.46875, 103.304000854492, 413.217987060547, 4.13199996948242, 2.75500011444092, 183.88200378418, 65.4260025024414, 0.689000010490417, 2248.59204101562, 0, 1.37699997425079, 16.5289993286133, 4.13199996948242, 4.13199996948242, 2.75500011444092, 4.13199996948242, 1.37699997425079, 0, 39.9440002441406, 2.75500011444092, 28.2369995117188, 0, 0, 5.51000022888184, 0, 48.8969993591309, 17.9060001373291, 485.531005859375, 1.37699997425079, 59.9169998168945, 221.759994506836, 28.2369995117188, 4.13199996948242, 65.4260025024414, 11.0190000534058, 38.5670013427734, 3.44300007820129, 8.95300006866455, 2.75500011444092, 23.4160003662109, 4.13199996948242, 50.5750015258789, 11.7080001831055, 19.2830009460449, 48.8969993591309, 0, 13.7740001678467, 92.9739990234375)

varB <- (Round(B, 58701))

ROUND2012$varB <- varB

^在这种情况下，我在Excel中使用了数据集的转置，因为我发现与附加行相比，将列附加到R中的数据集更容易。但理想情况下，我不必这样做，行将是我的领地，列是组身份人口数据。在这里，＆＃39; b＆＃39;是我正在调用的列的名称，58701是数字在舍入后需要累加的总数。

简而言之，我正在寻找一个对整个数据集而不是单个行有用的函数。理想情况下，我可以使用要舍入的数字来调用列，也可以使用我需要舍入数字等于的总数来调用列。

更新信息

作为一个更具说明性的例子。让我们说我的人口中有两个种族群体。

乙

     race1 race2 total

place1  1.2  2.1  3.4

place2  3.4  3.6  7.0

place3  7.7  0.8  8.5

place4  5.3  1.4  6.7

我需要这些数字等于我的总登记选民人数。总计是3.4,7.0,8.5,6.7，但我需要对每个地方行内的内容进行舍入，以使我的位置（1-4）总计为4.0,7.0,8.0和7.0。因此对于place1来说，我需要对内容进行舍入，以便1.2变为2.0，2.1变为2.0。等于4.0，我的登记选民人数。对于place2，总数已经是7，所以我们没问题。对于place3 7.7将变为7.0，0.8将变为1，总共给出8。最后对于place4，我需要5.3舍入到5和1.4舍入到2.0，总共给我7。我想要的是：

乙

     race1 race2 total

place1  2.0  2.0  4.0

place2  3.0  4.0  7.0

place3  7.0  1.0  8.0

place4  5.0  2.0  7.0

目前上面粘贴的圆形功能允许我一次调用一系列数字，并手动输入需要舍入的总数。但我正在寻找能够同时完成所有这一切的功能。我想将所有竞赛列调整为四舍五入，并调用包含所有必要总人口的列。（注意：实际上我已经在excel中对矩阵进行了转置并将其重新导入到R中，因为作为一个相当新的R用户，我发现将新列附加到数据集比添加新行更容易。但是我绝对不需要做那一步，事实上，我宁愿不这样做。）

Answer 1

有几种方法可以做到这一点，但从上面做出评论：

Round <- function(x, target) {
  r.x <- round(x)
  diff.x <- round(x) - x
  if ((s <- sum(r.x)) == target) {
    return(r.x)
  } else if (s > target) {
    select <- seq(along=x)[diff.x > 0]
    which <- which.max(diff.x[select])
    x[select[which]] <- r.x[select[which]] - 1
    Round(x, target)
  } else {
    select <- seq(along=x)[diff.x < 0]
    which <- which.min(diff.x[select])
    x[select[which]] <- r.x[select[which]] + 1
    Round(x, target)
  }
}

dat <- read.table(header = TRUE, row.names = paste0('place', 1:4),
                  text="race1 race2 total
                        1.2  2.1  3.4
                        3.4  3.6  7.0
                        7.7  0.8  8.5
                        5.3  1.4  6.7")

totals <- c(4.0, 7.0, 8.0, 7.0)

这两个示例只使用Round的两列中的1-1映射与dat

中的每个对应值，在每行上执行totals

lapply返回一个列表，因此要将输出转换回矩阵/数据框，我们rbind将所有内容重新组合在一起。

do.call(rbind, lapply(1:nrow(dat), function(x) Round(dat[x, -3], totals[x])))

#        race1 race2
# place1     2     2
# place2     3     4
# place3     7     1
# place4     5     2

apply的输出转换为您想要的内容，因此我们t结果

dat[3] <- totals

t(apply(dat, 1, function(x) Round(x[1:2], x[3])))

#        race1 race2
# place1     2     2
# place2     3     4
# place3     7     1
# place4     5     2

或者，您可以使用Map / mapply或Vectorize Round来提出更聪明的内容以避免这些循环，但它不会看起来你的数据非常大。

Answer 2

我想出了一个相对直接但懒惰的方法来解决你的问题。基本思路是：1。检查第二次需要多少额外数字; 2.动态地挑选出哪个数字最好应该是第二次舍入。

我使用了上面引用的数据集“B”，舍入值为58701;然后我将指定的圆形输出设置为58711。

raw <- B
round <- round(B)
data <- data.frame(raw, round)
calc_sum = sum(data$round)
desig_sum = 58711
data$residual = abs(data$raw - data$round)
data$above = ifelse(data$round > data$raw, 1, 0)
data$round2 = 0
data1 <- data[order(data$residual),]

if (calc_sum < desig_sum) {
    diff = desig_sum - calc_sum
    count = 0
    while (count < diff) {
        for (i in 1:nrow(data1)) {
            data_tmp <- subset(data1, round2 == 0 & above == 0)
# Finding out which the next number is for its second rounding
            if (data1[i,4] == 0 & data1[i,3] == max(data_tmp$residual)) {
                data1[i,5] = data1[i,2] + 1
                count = count + 1
            } else {
                count = count
            }
        }
    }
}

data2 <- data1[order(as.numeric(rownames(data1))),]
# Reverting back to the original order

data2$output = 0    
for (i in 1:nrow(data2)) {
    if (data2[i,5] != 0) {
        data2[i,6] = data2[i,5]
    } else {
        data2[i,6] = data2[i,1]
    }
}


data_final = data2[,6]

我还没有提出 calc_sum＆gt;的代码desig_sum ，但在这种情况下，代码与上面的代码差别不大。

此外，如果没有足够的数字来舍入指定的数字（例如，在上面的情况下， desig_sum = 5 ），代码将无效。

Answer 3

取整值等于总数等于给定数字的另一种方法，该方法也适用于follow up question中所示的情况。

您可以定义是否在以下位置进行调整：

最近的数字
最大数字
随机分布

，然后选择小数位数。

#Round to given total
#x..numeric vector
#target..sum of rounded x, if not given target = round(sum(x), digits)
#digits..number of decimal places
#closest..Make adjustment by changing closest number
#ref..reference level to calculate probability of adjustment, if ref==NA the probability of an adjustment is equal for all values of x
#random..should the adjustment be done stochastic or randomly
roundt <- function(x, target=NA, digits = 0, closest=TRUE, ref=0, random=FALSE) {
  if(is.na(target)) {target <- round(sum(x), digits)}
  if(all(x == 0)) {
    if(target == 0) {return(x)}
    x <- x + 1
  }
  xr <- round(x, digits)
  if(target == sum(xr)) {return(xr)}
  if(is.na(ref)) {
    wgt <- rep(1/length(x), length(x))
  } else {
    if(closest) {
      tt <- (x - xr) * sign(target - sum(xr)) + 10^-digits / 2
      wgt <- tt / sum(tt)
    } else {wgt <- abs(x-ref)/sum(abs(x-ref))}
  }
  if(random) {adj <- table(sample(factor(1:length(x)), size=abs(target - sum(xr))*10^digits, replace = T, prob=wgt))*sign(target - sum(xr))*10^-digits
  } else {adj <- diff(c(0,round(cumsum((target - sum(xr)) * wgt), digits)))}
  xr + adj
}

dat <- read.table(text="
race1 race2 total
1.2  2.1  4
3.4  3.6  7
7.7  0.8  8
5.3  1.4  7
3.4  3.6  5
7.7  0.8  12
-5  5  1
0    0    3
0    0    0
", header=T)

apply(dat, 1, function(x) roundt(x[1:2], x[3])) #Default round to target
apply(dat[1:6,], 1, function(x) roundt(x[1:2]*x[3]/sum(x[1:2]))) #Preadjust to target by multiplication
apply(dat, 1, function(x) roundt(x[1:2] + (x[3]-sum(x[1:2]))/2)) #Preadjust to target by addition
apply(dat, 1, function(x) roundt(x[1:2], x[3], cl=F)) #Prefer adjustment on large numbers
apply(dat, 1, function(x) roundt(x[1:2], x[3], ref=NA)) #Give all values the same probability of adjustment
apply(dat, 1, function(x) roundt(x[1:2], x[3], dig=1)) #Use one digit
apply(dat, 1, function(x) roundt(x[1:2], x[3], dig=1, random=TRUE)) #Make the adjustment by random sampling

R：舍入每行内的内容，使行总数等于我指定的数字

3 个答案: