在这个示例中,我想将向量“ x”简化为向量“ y”,其中每个元素都被随机地缩减以获得等于初始总和的50%的元素之和。
所得向量的值应为非负值,并且应小于原始值。
set.seed(1)
perc<-50
x<-sample(1:5,10,replace=TRUE)
xsum<-sum(x) # sum is 33
toremove<-floor(xsum*perc*0.01)
x # 2 2 3 5 2 5 5 4 4 1
y<-magicfunction(x,perc)
y # 0 2 1 4 0 3 2 1 2 1
sum(y) # sum is 16 (rounded half of 33)
您能想到一种方法吗?谢谢!
答案 0 :(得分:5)
假设x
足够长,我们可能会依赖一些适当的大数定律(还假设x
在某些其他方面足够规则)。为此,我们将生成另一个随机变量Z的值,其取值为[0,1],平均值为perc
。
set.seed(1)
perc <- 50 / 100
x <- sample(1:10000, 1000)
sum(x)
# [1] 5014161
x <- round(x * rbeta(length(x), perc / 3 / (1 - perc), 1 / 3))
sum(x)
# [1] 2550901
sum(x) * 2
# [1] 5101802
sum(x) * 2 / 5014161
# [1] 1.017479 # One percent deviation
在这里,我为Z选择了一个特定的beta分布,给出均值perc
,但您也可以选择其他分布。方差越小,结果越精确。例如,以下内容要好得多,因为先前选择的beta分布实际上是双峰的:
set.seed(1)
perc <- 50 / 100
x <- sample(1:1000, 100)
sum(x)
# [1] 49921
x <- round(x * rbeta(length(x), 100 * perc / (1 - perc), 100))
sum(x)
# [1] 24851
sum(x) * 2
# [1] 49702
sum(x) * 2 / 49921
# [1] 0.9956131 # Less than 0.5% deviation!
答案 1 :(得分:3)
此函数的替代解决方案,该函数按与向量元素大小成比例的随机分数对原始向量进行下采样。然后,检查元素是否不低于零,并迭代地找到最佳解决方案。
removereads<-function(x,perc=NULL){
xsum<-sum(x)
toremove<-floor(xsum*perc)
toremove2<-toremove
irem<-1
while(toremove2>(toremove*0.01)){
message("Downsampling iteration ",irem)
tmp<-sample(1:length(x),toremove2,prob=x,replace=TRUE)
tmp2<-table(tmp)
y<-x
common<-as.numeric(names(tmp2))
y[common]<-x[common]-tmp2
y[y<0]<-0
toremove2<-toremove-(xsum-sum(y))
irem<-irem+1
}
return(y)
}
set.seed(1)
x<-sample(1:1000,10000,replace=TRUE)
perc<-0.9
y<-removereads(x,perc)
plot(x,y,xlab="Before reduction",ylab="After reduction")
abline(0,1)
答案 2 :(得分:1)
这是一个使用Dirichlet分布中的抽奖的解决方案:
set.seed(1)
x = sample(10000, 1000, replace = TRUE)
magic = function(x, perc, alpha = 1){
# sample from the Dirichlet distribution
# sum(p) == 1
# lower values should reduce by less than larger values
# larger alpha means the result will have more "randomness"
p = rgamma(length(x), x / alpha, 1)
p = p / sum(p)
# scale p up an amount so we can subtract it from x
# and get close to the desired sum
reduce = round(p * (sum(x) - sum(round(x * perc))))
y = x - reduce
# No negatives
y = c(ifelse(y < 0, 0, y))
return (y)
}
alpha = 500
perc = 0.7
target = sum(round(perc * x))
y = magic(x, perc, alpha)
# Hopefully close to 1
sum(y) / target
> 1.000048
# Measure of the "randomness"
sd(y / x)
> 0.1376637
基本上,它试图找出减少每个元素的数量,同时仍接近所需的总数。您可以通过增加alpha
来控制想要新向量的“随机性”。