我计算了不同事件频率的向量,表示为分数并按降序排序。我需要连接到一个需要正整数百分比的工具,这个百分比必须总和恰好为100.我想以最能代表输入分布的方式生成百分比。也就是说,我希望百分比之间的关系(比率)与输入分数中的百分比最佳匹配,尽管任何非线性导致切割长尾。
我有一个产生这些百分比的功能,但我认为它不是最佳或优雅的。特别是,在使用"愚蠢的整数技巧"之前,我想在数字空间中做更多的工作。
以下是一个示例频率向量:
fractionals <- 1 / (2 ^ c(2, 5:6, 8, rep(9,358)))
这是我的功能:
# Convert vector of fractions to integer percents summing to 100
percentize <- function(fractionals) {
# fractionals is sorted descending and adds up to 1
# drop elements that wouldn't round up to 1% vs. running total
pctOfCum <- fractionals / cumsum(fractionals)
fractionals <- fractionals[pctOfCum > 0.005]
# calculate initial percentages
percentages <- round((fractionals / sum(fractionals)) * 100)
# if sum of percentages exceeds 100, remove proportionally
i <- 1
while (sum(percentages) > 100) {
excess <- sum(percentages) - 100
if (i > length(percentages)) {
i <- 1
}
partialExcess <- max(1, round((excess * percentages[i]) / 100))
percentages[i] <- percentages[i] - min(partialExcess,
percentages[i] - 1)
i <- i + 1
}
# if sum of percentages shorts 100, add proportionally
i <- 1
while (sum(percentages) < 100) {
shortage <- 100 - sum(percentages)
if (i > length(percentages)) {
i <- 1
}
partialShortage <- max(1, round((shortage * percentages[i]) / 100))
percentages[i] <- percentages[i] + partialShortage
i <- i + 1
}
return(percentages)
}
有什么想法吗?
答案 0 :(得分:0)
这个怎么样?它重新调整变量,使它应该加到100,但如果由于四舍五入到99,它会将最大频率加1。
fractionals <- 1 / (2 ^ c(2, 5:6, 8, rep(9,358)))
pctOfCum <- fractionals / cumsum(fractionals)
fractionals <- fractionals[pctOfCum > 0.005]
bunnies <- as.integer(fractionals / sum(fractionals) * 100) + 1
bunnies[bunnies > 1] <- round(bunnies[bunnies > 1] * (100 -
sum(bunnies[bunnies == 1])) / sum(bunnies[bunnies > 1]))
if((sum(bunnies) < 100) == TRUE) bunnies[1] <- bunnies[1] + 1
> bunnies
[1] 45 6 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1