我正在尝试计算可以在here中找到的A / B测试数据集的贝叶斯因子。但是,由于β系数的值为零,因此我得到了NaN。在计算可能性时,我假设它遵循二项式分布。因此,我遵循以下公式:
可能性= select(n,k)* Beta(k + 1,n-k + 1)
代码可在下面找到
data <- read.csv(file="ab_data.csv", header=TRUE, sep=",")
control <- data[which(data$group == "control"),]
treatment <- data[which(data$group == "treatment"),]
#compute bayes factor
n1 = nrow(control)
r1 = sum(control$converted)
n2 = nrow(treatment)
r2 = sum(treatment$converted)
likelihood_control <- choose(n1,r1) * beta(r1+1, n1-r1+1)
likelihood_treatment <- choose(n2,r2) * beta(r2+1, n2-r2+1)
bayes_factor <- likelihood_control/ likelihood_treatment
beta(r1+1, n1+r1+1)
beta(r2+1, n2-r2+1)
bayes_factor
答案 0 :(得分:0)
如您所见,问题在于beta函数返回的是0,但这并不是因为可能性实际上是0,仅仅是因为可能性很小,因此计算机将其存储为0。第二个问题是选择返回Inf。同样,这不是因为值实际上是无限的,仅仅是R无法在内部存储那么大的值。解决方案是使用对数,对数的增长要慢得多,然后最后取幂。下面应该可以工作(我测试了logchoose函数,它似乎可以工作)
logchoose <- function(n, k){
num <- sum(log(seq(n - k + 1, n)))
denom <- sum(log(1:k))
return(num - denom)
}
likelihood_control <- logchoose(n1,r1) + lbeta(r1+1, n1-r1+1)
likelihood_treatment <- logchoose(n2,r2) + lbeta(r2+1, n2-r2+1)
bayes_factor <- exp(likelihood_control - likelihood_treatment)
bayes_factor