我正在尝试使用PyMC作为输入四种不同的医学状况预测因子,并将它们结合起来产生一个总体后验概率,患者具有给定预测因子子集的条件,说是"是的,这病人有这种情况"。
这个想法是从β分布中为每个预测变量选择θ(条件的总体速率)和假阴性和假阳性率,并从中使用贝叶斯定理计算边际概率和后验概率。我有一个16个计数值的数组,每个可能的预测变量组合一个(因为有4个预测变量,有2 ** 4 = 16种不同的预测变量组合)。我将这最后一组计数和边际概率提供给多项分布,类似于在PyMC教程http://pymc-devs.github.io/pymc/tutorial.html的以下示例中,如何将disasters_array与泊松分布一起使用。
这是我编写的代码尝试并执行此操作:
from pymc import Multinomial, Beta, deterministic
from numpy import array
n = 4 # number of predictors
counts_array = array([2942808473, 17491655, 21576, 23189, 339805, 89159, 168214, 76044, 43138288, 530963, 22682, 22169, 462052, 129472, 2804257, 3454104]) # 16 counts - one count value for each possible permutation of predictors that detected medical condition
pred = array([[0,0,0,0],[1,0,0,0],[0,1,0,0],[1,1,0,0],[0,0,1,0],[1,0,1,0],[0,1,1,0],[1,1,1,0],[0,0,0,1],[1,0,0,1],[0,1,0,1],[1,1,0,1],[0,0,1,1],[1,0,1,1],[0,1,1,1],[1,1,1,1]]); # array of yes/no's from predictors for each value in counts_array above
theta = Beta('theta',1,2)
fn = fp = tn = tp = [0] * 4;
for i in range(0,n):
fn[i] = Beta('fn' + str(i),1,2)
fp[i] = Beta('fp' + str(i),1,2)
tn[i] = 1 - fp[i]
tp[i] = 1 - fn[i]
@deterministic(plot=False)
def margprobs():
mp = [0] * 2**n; # initialize with vector of 2**n zeros
for i in range(0,2**n):
mp[i] = theta *\
(fn[0]**(1-pred[i][0])) * (tp[0]**pred[i][0]) *\
(fn[1]**(1-pred[i][1])) * (tp[1]**pred[i][1]) *\
(fn[2]**(1-pred[i][2])) * (tp[2]**pred[i][2]) *\
(fn[3]**(1-pred[i][3])) * (tp[3]**pred[i][3])\
+ (1-theta) *\
(tn[0]**(1-pred[i][0])) * (fp[0]**pred[i][0]) *\
(tn[1]**(1-pred[i][1])) * (fp[1]**pred[i][1]) *\
(tn[2]**(1-pred[i][2])) * (fp[2]**pred[i][2]) *\
(tn[3]**(1-pred[i][3])) * (fp[3]**pred[i][3]);
return mp;
@deterministic(plot=False)
def postprobs():
pp = [0] * 2**n; # initialize with vector of 2**n zeros
for i in range(0,2**n):
pp[i] = theta *\
(fn[0]**(1-pred[i][0])) * (tp[0]**pred[i][0]) *\
(fn[1]**(1-pred[i][1])) * (tp[1]**pred[i][1]) *\
(fn[2]**(1-pred[i][2])) * (tp[2]**pred[i][2]) *\
(fn[3]**(1-pred[i][3])) * (tp[3]**pred[i][3])\
/ margprobs[i];
return pp;
counts = Multinomial(name="counts", value=counts_array, n=2**n, p=margprobs, observed=True)
当我运行它时,在计算计数时,我得到与最后一行有关的错误:
$ python test.py
Traceback (most recent call last):
File "test.py", line 46, in <module>
counts = Multinomial(name="counts", value=counts_array, n=2**n, p=margprobs, observed=True)
File "/Users/jtr4v/anaconda/lib/python2.7/site-packages/pymc/distributions.py", line 3269, in __init__
verbose=verbose, **kwds)
File "/Users/jtr4v/anaconda/lib/python2.7/site-packages/pymc/PyMCObjects.py", line 772, in __init__
if not isinstance(self.logp, float):
File "/Users/jtr4v/anaconda/lib/python2.7/site-packages/pymc/PyMCObjects.py", line 929, in get_logp
raise ZeroProbability(self.errmsg)
pymc.Node.ZeroProbability: Stochastic counts's value is outside its support,
or it forbids its parents' current values
显然这个错误与PyMC有关,不喜欢我给Multinomial()提供的一些价值,但我不确定哪一个是错的。我认为值应该是counts_array(我的观察值为count),n应该是16,因为我想选择16个项目的数组用于计数,一个用于每个可能的预测变量组合,p应该是我的边际概率,并且观察到应该是真的,因为我观察了价值观。
我做错了什么?
编辑:如果有帮助,我之前在R2jags中使用以下代码执行此操作:
model {
theta ~ dbeta(1,2); # draw theta from a beta distribution
for (i in 1:N) { # draw an false positive and false negative rate for each of N predictors
fp[i] ~ dbeta(1,2);
fn[i] ~ dbeta(1,2);
tp[i] <- 1-fn[i]; # true positive = 1 - false negative rate
tn[i] <- 1-fp[i]; # true negative rate = 1 - false positive rate
}
for (j in 1:M) {
# Bayes theorem, i.e.
# posterior probability =
# P(A) * P(B|A) /
# /
# P(A) * P(B|A) + P(-A) * P(B|-A) # <--- last line is marginal probability
#
# x is a vector of 1's and 0's indicating whether the ith predictor said yes or no
margprobs[j] <- (theta *
(fn[1]^(1-x[j,1])) * (tp[1]^x[j,1]) *
(fn[2]^(1-x[j,2])) * (tp[2]^x[j,2]) *
(fn[3]^(1-x[j,3])) * (tp[3]^x[j,3]) *
(fn[4]^(1-x[j,4])) * (tp[4]^x[j,4])
+ (1-theta) *
(tn[1]^(1-x[j,1])) * (fp[1]^x[j,1]) *
(tn[2]^(1-x[j,2])) * (fp[2]^x[j,2]) *
(tn[3]^(1-x[j,3])) * (fp[3]^x[j,3]) *
(tn[4]^(1-x[j,4])) * (fp[4]^x[j,4]));
postprobs[j] <- theta *
(fn[1]^(1-x[j,1])) * (tp[1]^x[j,1]) *
(fn[2]^(1-x[j,2])) * (tp[2]^x[j,2]) *
(fn[3]^(1-x[j,3])) * (tp[3]^x[j,3]) *
(fn[4]^(1-x[j,4])) * (tp[4]^x[j,4])
/ margprobs[j];
}
counts ~ dmulti(margprobs, total);
}