列举一系列具有不同概率的伯努利试验的所有可能的组合概率

时间:2016-10-06 19:09:43

标签: r probability poisson binomial-cdf

假设我有一系列n个独立伯努利试验成功的概率,p1到pn使得p1!= p2!= ......!= pn。为每个试验提供一个独特的名称。

    p <- c(0.5, 0.12, 0.7, 0.8, .02)
    a <- c("A","B","C","D","E")

我从搜索堆栈交换(例如,herehere)知道我可以使用Poisson二项分布函数找到cdf,pmf等。

我感兴趣的是成功和失败的每种可能组合的确切概率。 (例如,如果我画了一个概率树,我想知道每个分支结束时的概率。)

    all <- prod(p)
    all
    [1] 0.000672
    o1 <- (0.5 * (1-0.12) * 0.7 * 0.8 * .02)
    o1
    [1] 0.004928
    o2 <- (0.5 * 0.12 * (1-0.7) * 0.8 * .02)
    o2
    [1] 0.000288

...对于所有2 ^ 5种可能的成功/失败组合。

在R中有什么有效的方法?

在我的实际数据集的情况下,试验次数是19,所以我们在概率树上谈论总共2 ^ 19条路径。

2 个答案:

答案 0 :(得分:1)

快速进行此计算的关键是在对数概率空间中进行此操作,以便树的每个分支的乘积是可以计算为矩阵乘以的内部和的和。以这种方式,所有分支可以以矢量化方式一起计算。

首先,我们构造一个所有分支的枚举。为此,我们使用intToBin包中的R.utils函数:

library(R.utils)
enum.branches <- unlist(strsplit(intToBin(seq_len(2^n)-1),split=""))

其中n是伯努利变量的数量。例如,n=5

matrix(enum.branches, nrow=n)
##     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17]
##[1,] "0"  "0"  "0"  "0"  "0"  "0"  "0"  "0"  "0"  "0"   "0"   "0"   "0"   "0"   "0"   "0"   "1"  
##[2,] "0"  "0"  "0"  "0"  "0"  "0"  "0"  "0"  "1"  "1"   "1"   "1"   "1"   "1"   "1"   "1"   "0"  
##[3,] "0"  "0"  "0"  "0"  "1"  "1"  "1"  "1"  "0"  "0"   "0"   "0"   "1"   "1"   "1"   "1"   "0"  
##[4,] "0"  "0"  "1"  "1"  "0"  "0"  "1"  "1"  "0"  "0"   "1"   "1"   "0"   "0"   "1"   "1"   "0"  
##[5,] "0"  "1"  "0"  "1"  "0"  "1"  "0"  "1"  "0"  "1"   "0"   "1"   "0"   "1"   "0"   "1"   "0"  
##     [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32]
##[1,] "1"   "1"   "1"   "1"   "1"   "1"   "1"   "1"   "1"   "1"   "1"   "1"   "1"   "1"   "1"  
##[2,] "0"   "0"   "0"   "0"   "0"   "0"   "0"   "1"   "1"   "1"   "1"   "1"   "1"   "1"   "1"  
##[3,] "0"   "0"   "0"   "1"   "1"   "1"   "1"   "0"   "0"   "0"   "0"   "1"   "1"   "1"   "1"  
##[4,] "0"   "1"   "1"   "0"   "0"   "1"   "1"   "0"   "0"   "1"   "1"   "0"   "0"   "1"   "1"  
##[5,] "1"   "0"   "1"   "0"   "1"   "0"   "1"   "0"   "1"   "0"   "1"   "0"   "1"   "0"   "1"  

得到一个矩阵,其中每列是概率树分支的结果。

现在,使用它构建一个与enum.branches大小相同的日志概率矩阵,其中值为log(p)如果enum.branches=="1"log(1-p)。对于您的数据,使用p <- c(0.5, 0.12, 0.7, 0.8, .02),即:

logp <- matrix(ifelse(enum.branches == "1", rep(log(p), 2^n), rep(log(1-p), 2^n)), nrow=n)

然后,将对数概率求和并取指数得到概率的乘积:

result <- exp(rep(1,n) %*% logp)
##         [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]     [,9]   [,10]
##[1,] 0.025872 0.000528 0.103488 0.002112 0.060368 0.001232 0.241472 0.004928 0.003528 7.2e-05
        [,11]    [,12]    [,13]    [,14]    [,15]    [,16]    [,17]    [,18]    [,19]    [,20]
##[1,] 0.014112 0.000288 0.008232 0.000168 0.032928 0.000672 0.025872 0.000528 0.103488 0.002112
        [,21]    [,22]    [,23]    [,24]    [,25]   [,26]    [,27]    [,28]    [,29]    [,30]
##[1,] 0.060368 0.001232 0.241472 0.004928 0.003528 7.2e-05 0.014112 0.000288 0.008232 0.000168
        [,31]    [,32]
##[1,] 0.032928 0.000672

resultenum.branches中分支的编号顺序相同。

我们可以将计算封装到一个函数中:

enum.prob.product <- function(n, p) {
  enum.branches <- unlist(strsplit(intToBin(seq_len(2^n)-1),split=""))
  exp(rep(1,n) %*% matrix(ifelse(enum.branches == "1", rep(log(p), 2^n), rep(log(1-p), 2^n)), nrow=n))
}

使用19个独立伯努利变量对此进行计时:

n <- 19
p <- runif(n)
system.time(enum.prob.product(n,p))
##   user  system elapsed 
## 24.064   1.470  26.082 

这是在我的2 GHz MacBook上(大约2009年)。应该注意的是,计算本身非常快;它是概率树的分支的枚举(我猜这个中的unlist)占用了大部分时间。社区对另一种做法的建议将不胜感激。

答案 1 :(得分:1)

试试这个基础R:

p <- c(0.5, 0.12, 0.7, 0.8, .02)
a <- c("A","B","C","D","E")
n <- length(p)
apply(expand.grid(replicate(n,list(0:1)))[n:1], 1, 
                  function(x) prod(p[which(x==1)])*prod(1-p[which(x==0)]))

#[1] 0.025872 0.000528 0.103488 0.002112 0.060368 0.001232 0.241472 0.004928 0.003528 0.000072 0.014112 0.000288 0.008232 0.000168 0.032928 0.000672 0.025872
#[18] 0.000528 0.103488 0.002112 0.060368 0.001232 0.241472 0.004928 0.003528 0.000072 0.014112 0.000288 0.008232 0.000168 0.032928 0.000672