从均匀分布的混合中绘制随机数

时间:2015-12-23 20:27:02

标签: r random distribution

目标

我正在尝试构建一个函数,从一个不完整的统一分布中抽取特定数量的随机数"。

我称之为不完整的统一分布?

我将不完整的均匀分布称为概率分布,其中一系列边界内的X的每个值具有相同的被挑选概率。换句话说,它是一个带孔的均匀分布(概率为零),如下所示

x = list(12:25, 34:54, 67:90, 93:115)
y = 1/sum(25-12, 54-34, 90-67, 115-93)
plot(y=rep(y, length(unlist(x))), x=unlist(x), type="n", ylab="Probability", xlab="X")
for (xi in x)
{
    points(xi,rep(y, length(xi)), type="l", lwd=4)
}

enter image description here

丑陋的解决方案

这是一个缓慢而丑陋的解决方案

IncompleteUnif = function(n,b)
{
    #################
    # "n" is the desired number of random numbers
    # "b" is a list describing the boundaries within which a random number can possibly be drawn.
    #################
    r = c() # Series of random numbers to return
    for (ni in n)
    {
        while (length(r) < n) # loop will continue until we have the "n" random numbers we need
        {
            ub = unlist(b)
            x = runif(1,min(ub), max(ub)) # one random number taken over the whole range
            for (bi in b) # The following loop test if the random number is withinn the ranges specified by "b"
            {
                if (min(bi) < x & max(bi) > x) # if found in one range then just add "x" to "r" and break
                {
                    r = append(r,x)
                    break
                }
            }
        }
    }
    return (r)
}

b = list(c(5,94),c(100,198),c(220,292), c(300,350))
set.seed(12)
IncompleteUnif(10,b)
[1]  28.929516 287.132444 330.204498  63.425103  16.693990  66.680826 226.374551  12.892821   7.872065 140.480533

5 个答案:

答案 0 :(得分:5)

您的不完全均匀分布可以表示为四个普通均匀分布的混合,每个分段的混合权重与其长度成比例(即,使得分段越长,其具有越多的重量)。

要从此类分布中进行采样,请先选择一个分段(将权重考虑在内)。然后从所选的段中选择一个元素。

答案 1 :(得分:4)

我相信这是有效的,使用Robert Dodier建议的算法:

rmixunif = function(n, b) {
    subdists = sample(seq_along(b), size = n, replace = T, prob = sapply(b, diff))
    subdists_n = tabulate(subdists)
    draw = numeric(n)
    for (i in unique(subdists)) {
        draw[subdists == i] = runif(subdists_n[i], min = b[[i]][1], max = b[[i]][2])
    }
    return(draw)
}

rmixunif(10, b = list(c(5,94),c(100,198),c(220,292), c(300,350)))
#  [1]  64.85989  85.33292 235.39607 233.40133 240.28686  67.21626 237.60248  11.80377 151.65365 306.44473

我喜欢Sam Dickson的直方图视觉检查,这是我的版本:

x <- rmixunif(10000,list(c(0,1),c(2.5,3),c(6,10)))
hist(x,breaks=20)

enter image description here

可能会考虑一些输入检查(可能是评论中建议的mapply),但我会将其留给其他人。

感谢alexis_iaz提出的tabulate()建议!

答案 2 :(得分:4)

@ Gregor解决方案稍微复杂的版本。

mix_unif <- function(n, b){
  x <- c()
  ns <- rmultinom(1, n, sapply(b, diff))
  for (i in seq_along(ns)) {
    x <- c(x, runif(ns[i], b[[i]][1], b[[i]][2]))
  }
  x
}

microbenchmark(mix_unif(1e5, b),
               rmixunif(1e5, b),
               IncompleteUnif(1e5, b), 
               unit="relative")
Unit: relative
                     expr      min       lq     mean   median       uq      max neval
       mix_unif(1e+05, b) 1.000000 1.000000 1.000000 1.000000 1.000000  1.00000   100
       rmixunif(1e+05, b) 3.123515 3.235961 3.750369 3.496843 3.462529 15.73449   100
 IncompleteUnif(1e+05, b) 6.806916 7.247425 6.926282 7.188556 7.093928 18.20041   100

答案 3 :(得分:2)

另一种解决方案是转换输出。我们的想法是从随机均匀分布中进行采样,然后应用条件转换,使数字仅落在所选范围内:

IncompleteUnif = function(n,b) {
  widths <- cumsum(sapply(b,diff))
  x <- runif(n,0,tail(widths,1))
  out <- x
  out[x<=widths[1]] <- x[x<=widths[1]] + b[[1]][1]
  for(i in 2:length(b)) {
    out[widths[i-1]<x & x<=widths[i]] <- x[widths[i-1]<x & x<=widths[i]] - widths[i-1] + b[[i]][1]
  }
  return(out)
}

x <- IncompleteUnif(10000,list(c(0,1),c(2.5,3),c(6,10)))
hist(x,breaks=20)

enter image description here

答案 4 :(得分:1)

我参加聚会的时间已经晚了几年,但是看到没有明确循环的解决方案,这里有一个这样的实现(遵循@ RobertDodier&#39;方法):

rmunif <- function(n, b) {
  runifb <- function(n, b) runif(n, b[1], b[2])
  ns <- rmultinom(1, n, vapply(b, diff, 1))
  unlist(Map(runifb, ns, b), use.names = FALSE)
}

hist(rmunif(1e5, list(0:1, c(5, 8), 9:10)))

library(microbenchmark)

set.seed(2018)
n <- 1e5

microbenchmark(
  rmunif(n, b),
  mix_unif(n, b),
  rmixunif(n, b),
  IncompleteUnif(n, b),
  unit = "relative"
) -> mb

print(mb, signif = 5)
#> Unit: relative
#>                  expr    min     lq   mean median     uq    max neval
#>          rmunif(n, b) 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000   100
#>        mix_unif(n, b) 1.1181 1.1256 1.1281 1.1728 1.1236 1.0476   100
#>        rmixunif(n, b) 2.7822 2.8982 2.9899 2.7850 2.8345 1.3970   100
#>  IncompleteUnif(n, b) 4.4922 4.7089 5.2732 4.5764 8.4317 2.4364   100

reprex package(v0.2.0)创建于2018-03-11。