目标
我正在尝试构建一个函数,从一个不完整的统一分布中抽取特定数量的随机数"。
我称之为不完整的统一分布?
我将不完整的均匀分布称为概率分布,其中一系列边界内的X
的每个值具有相同的被挑选概率。换句话说,它是一个带孔的均匀分布(概率为零),如下所示
x = list(12:25, 34:54, 67:90, 93:115)
y = 1/sum(25-12, 54-34, 90-67, 115-93)
plot(y=rep(y, length(unlist(x))), x=unlist(x), type="n", ylab="Probability", xlab="X")
for (xi in x)
{
points(xi,rep(y, length(xi)), type="l", lwd=4)
}
丑陋的解决方案
这是一个缓慢而丑陋的解决方案
IncompleteUnif = function(n,b)
{
#################
# "n" is the desired number of random numbers
# "b" is a list describing the boundaries within which a random number can possibly be drawn.
#################
r = c() # Series of random numbers to return
for (ni in n)
{
while (length(r) < n) # loop will continue until we have the "n" random numbers we need
{
ub = unlist(b)
x = runif(1,min(ub), max(ub)) # one random number taken over the whole range
for (bi in b) # The following loop test if the random number is withinn the ranges specified by "b"
{
if (min(bi) < x & max(bi) > x) # if found in one range then just add "x" to "r" and break
{
r = append(r,x)
break
}
}
}
}
return (r)
}
b = list(c(5,94),c(100,198),c(220,292), c(300,350))
set.seed(12)
IncompleteUnif(10,b)
[1] 28.929516 287.132444 330.204498 63.425103 16.693990 66.680826 226.374551 12.892821 7.872065 140.480533
答案 0 :(得分:5)
您的不完全均匀分布可以表示为四个普通均匀分布的混合,每个分段的混合权重与其长度成比例(即,使得分段越长,其具有越多的重量)。
要从此类分布中进行采样,请先选择一个分段(将权重考虑在内)。然后从所选的段中选择一个元素。
答案 1 :(得分:4)
我相信这是有效的,使用Robert Dodier建议的算法:
rmixunif = function(n, b) {
subdists = sample(seq_along(b), size = n, replace = T, prob = sapply(b, diff))
subdists_n = tabulate(subdists)
draw = numeric(n)
for (i in unique(subdists)) {
draw[subdists == i] = runif(subdists_n[i], min = b[[i]][1], max = b[[i]][2])
}
return(draw)
}
rmixunif(10, b = list(c(5,94),c(100,198),c(220,292), c(300,350)))
# [1] 64.85989 85.33292 235.39607 233.40133 240.28686 67.21626 237.60248 11.80377 151.65365 306.44473
我喜欢Sam Dickson的直方图视觉检查,这是我的版本:
x <- rmixunif(10000,list(c(0,1),c(2.5,3),c(6,10)))
hist(x,breaks=20)
可能会考虑一些输入检查(可能是评论中建议的mapply
),但我会将其留给其他人。
感谢alexis_iaz提出的tabulate()
建议!
答案 2 :(得分:4)
@ Gregor解决方案稍微复杂的版本。
mix_unif <- function(n, b){
x <- c()
ns <- rmultinom(1, n, sapply(b, diff))
for (i in seq_along(ns)) {
x <- c(x, runif(ns[i], b[[i]][1], b[[i]][2]))
}
x
}
microbenchmark(mix_unif(1e5, b),
rmixunif(1e5, b),
IncompleteUnif(1e5, b),
unit="relative")
Unit: relative
expr min lq mean median uq max neval
mix_unif(1e+05, b) 1.000000 1.000000 1.000000 1.000000 1.000000 1.00000 100
rmixunif(1e+05, b) 3.123515 3.235961 3.750369 3.496843 3.462529 15.73449 100
IncompleteUnif(1e+05, b) 6.806916 7.247425 6.926282 7.188556 7.093928 18.20041 100
答案 3 :(得分:2)
另一种解决方案是转换输出。我们的想法是从随机均匀分布中进行采样,然后应用条件转换,使数字仅落在所选范围内:
IncompleteUnif = function(n,b) {
widths <- cumsum(sapply(b,diff))
x <- runif(n,0,tail(widths,1))
out <- x
out[x<=widths[1]] <- x[x<=widths[1]] + b[[1]][1]
for(i in 2:length(b)) {
out[widths[i-1]<x & x<=widths[i]] <- x[widths[i-1]<x & x<=widths[i]] - widths[i-1] + b[[i]][1]
}
return(out)
}
x <- IncompleteUnif(10000,list(c(0,1),c(2.5,3),c(6,10)))
hist(x,breaks=20)
答案 4 :(得分:1)
我参加聚会的时间已经晚了几年,但是看到没有明确循环的解决方案,这里有一个这样的实现(遵循@ RobertDodier&#39;方法):
rmunif <- function(n, b) {
runifb <- function(n, b) runif(n, b[1], b[2])
ns <- rmultinom(1, n, vapply(b, diff, 1))
unlist(Map(runifb, ns, b), use.names = FALSE)
}
hist(rmunif(1e5, list(0:1, c(5, 8), 9:10)))
library(microbenchmark)
set.seed(2018)
n <- 1e5
microbenchmark(
rmunif(n, b),
mix_unif(n, b),
rmixunif(n, b),
IncompleteUnif(n, b),
unit = "relative"
) -> mb
print(mb, signif = 5)
#> Unit: relative
#> expr min lq mean median uq max neval
#> rmunif(n, b) 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 100
#> mix_unif(n, b) 1.1181 1.1256 1.1281 1.1728 1.1236 1.0476 100
#> rmixunif(n, b) 2.7822 2.8982 2.9899 2.7850 2.8345 1.3970 100
#> IncompleteUnif(n, b) 4.4922 4.7089 5.2732 4.5764 8.4317 2.4364 100
由reprex package(v0.2.0)创建于2018-03-11。