我不确定我是否在pymc中发现了一个错误。看起来拟合具有缺失数据的二项式可能会产生ZeroProbability
错误,具体取决于掩盖缺失数据的所选fill_value。但也许我错误地使用它。
我用github的当前master分支尝试了以下示例。我知道bug concerning Binomial distributions in pymc 2.3.4,但这似乎是一个不同的问题。
我使用pymc进行二项分布,一切都按预期工作:
import scipy as sp
import pymc
def make_model(observed_values):
p = pymc.Uniform('p', lower = 0.0, upper = 1.0, value = 0.1)
values = pymc.Binomial('values', n = 10* sp.ones_like(observed_values), p = p * sp.ones_like(observed_values),\
value = observed_values, observed = True, plot = False)
values = pymc.Binomial('values', n = 10, p = p,\
value = observed_values, observed = True, plot = False)
return locals()
sp.random.seed(0)
observed_values = sp.random.binomial(n = 10.0, p = 0.1, size = 100)
M1 = pymc.MCMC(make_model(observed_values))
M1.sample(iter=10000, burn=1000, thin=10)
pymc.Matplot.plot(M1)
M1.summary()
输出:
[-----------------100%-----------------] 10000 of 10000 complete in 0.7 sec
Plotting p
p:
Mean SD MC Error 95% HPD interval
------------------------------------------------------------------
0.093 0.007 0.0 [ 0.081 0.107]
Posterior quantiles:
2.5 25 50 75 97.5
|---------------|===============|===============|---------------|
0.08 0.088 0.093 0.097 0.106
现在,我尝试了一个非常相似的情况,区别在于缺少一个观察值:
mask = sp.zeros_like(observed_values)
mask[0] = True
masked_values = sp.ma.masked_array(observed_values, mask = mask, fill_value = 999999)
M2 = pymc.MCMC(make_model(masked_values))
M2.sample(iter=10000, burn=1000, thin=10)
pymc.Matplot.plot(M2)
M2.summary()
出乎意料的是,我收到了ZeroProbability
错误:
---------------------------------------------------------------------------
ZeroProbability Traceback (most recent call last)
<ipython-input-16-4f945f269628> in <module>()
----> 1 M2 = pymc.MCMC(make_model(masked_values))
2 M2.sample(iter=10000, burn=1000, thin=10)
3 pymc.Matplot.plot(M2)
4 M2.summary()
<ipython-input-12-cb8707bb911f> in make_model(observed_values)
4 def make_model(observed_values):
5 p = pymc.Uniform('p', lower = 0.0, upper = 1.0, value = 0.1)
----> 6 values = pymc.Binomial('values', n = 10* sp.ones_like(observed_values), p = p * sp.ones_like(observed_values), value = observed_values, observed = True, plot = False)
7 values = pymc.Binomial('values', n = 10, p = p, value = observed_values, observed = True, plot = False)
8 return locals()
/home/fabian/anaconda/lib/python2.7/site-packages/pymc/distributions.pyc in __init__(self, *args, **kwds)
318 logp_partial_gradients=logp_partial_gradients,
319 dtype=dtype,
--> 320 **arg_dict_out)
321
322 new_class.__name__ = name
/home/fabian/anaconda/lib/python2.7/site-packages/pymc/PyMCObjects.pyc in __init__(self, logp, doc, name, parents, random, trace, value, dtype, rseed, observed, cache_depth, plot, verbose, isdata, check_logp, logp_partial_gradients)
773 if check_logp:
774 # Check initial value
--> 775 if not isinstance(self.logp, float):
776 raise ValueError(
777 "Stochastic " +
/home/fabian/anaconda/lib/python2.7/site-packages/pymc/PyMCObjects.pyc in get_logp(self)
930 (self._value, self._parents.value))
931 else:
--> 932 raise ZeroProbability(self.errmsg)
933
934 return logp
ZeroProbability: Stochastic values's value is outside its support,
or it forbids its parents' current values.
但是,如果我将蒙版数组中的填充值更改为1,则拟合再次起作用:
masked_values2 = sp.ma.masked_array(observed_values, mask = mask, fill_value = 1)
M3 = pymc.MCMC(make_model(masked_values2))
M3.sample(iter=10000, burn=1000, thin=10)
pymc.Matplot.plot(M3)
M3.summary()
输出:
[-----------------100%-----------------] 10000 of 10000 complete in 2.1 sec
Plotting p
p:
Mean SD MC Error 95% HPD interval
------------------------------------------------------------------
0.092 0.007 0.0 [ 0.079 0.105]
Posterior quantiles:
2.5 25 50 75 97.5
|---------------|===============|===============|---------------|
0.079 0.088 0.092 0.097 0.105
values:
Mean SD MC Error 95% HPD interval
------------------------------------------------------------------
1.15 0.886 0.029 [ 0. 3.]
Posterior quantiles:
2.5 25 50 75 97.5
|---------------|===============|===============|---------------|
0.0 1.0 1.0 2.0 3.0
这是一个错误还是我的模型有问题? 谢谢你的帮助!