对于一个非常简单的连续玩具模型,我使用NUTS得到了pymc3中看起来不正确的后部。后验不同意分析计算和大都会后验。
在下面的代码中,我使用固定的随机种子生成合成数据(因此结果是可重现的)。然后我在pymc3中定义相同的生成模型,仅观察最终数据。最后,我将其中一个潜在变量的边际分布与真正的分析后验和Metropolis后验进行了比较。结果不一致。
#!/usr/bin/env python2
from __future__ import division
import numpy as np
import pymc3 as mc
import scipy as sci
import theano.tensor as th
np.random.seed(13)
n = 10
tau_scale = 2
tau0 = sci.stats.expon.rvs() * tau_scale
mu0 = np.random.randn(n) / np.sqrt(tau0)
x0 = mu0 + np.random.randn(n)
with mc.Model() as model1:
tau = mc.Exponential('tau', lam=1 / tau_scale)
mu = mc.Normal('mu', tau=tau, shape=(n,))
mc.Normal('x', mu=mu, observed=x0)
with mc.Model() as model2:
tau = mc.Exponential('tau', lam=1 / tau_scale)
mu_z = mc.Normal('mu_z', shape=(n,))
mu = mc.Deterministic('mu', mu_z / th.sqrt(tau))
mc.Normal('x', mu=mu, observed=x0)
def infer(model):
with model:
map_ = mc.find_MAP(fmin=sci.optimize.fmin_l_bfgs_b)
step = mc.NUTS(scaling=map_)
trace = mc.sample(100, step=step, start=map_, progressbar=False)
step = mc.NUTS(scaling=trace[-1])
return mc.sample(11000, step=step, start=trace[-1], progressbar=False)
trace1 = infer(model1)
trace2 = infer(model2)
with model2:
trace3 = mc.sample(100000, step=mc.Metropolis(), progressbar=False,
start=mc.find_MAP(fmin=sci.optimize.fmin_l_bfgs_b))
samples_tau1 = trace1['tau'][1000:]
samples_tau2 = trace2['tau'][1000:]
samples_tau3 = trace3['tau'][10000:]
print
print 'pymc3 version: ' + mc.__version__
print
print 'Model 1 NUTS tau'
print 'Mean: {0:3.1f}'.format(samples_tau1.mean())
print 'Standard Deviation: {0:3.1f}'.format(samples_tau1.std())
print 'Median {0:3.1f}'.format(np.percentile(samples_tau1, 50))
print
print 'Model 2 NUTS tau'
print 'Mean: {0:3.1f}'.format(samples_tau2.mean())
print 'Standard Deviation: {0:3.1f}'.format(samples_tau2.std())
print 'Median {0:3.1f}'.format(np.percentile(samples_tau2, 50))
print
print 'Model 2 Metropolis tau'
print 'Mean: {0:3.1f}'.format(samples_tau3.mean())
print 'Standard Deviation: {0:3.1f}'.format(samples_tau3.std())
print 'Median {0:3.1f}'.format(np.percentile(samples_tau3, 50))
我实际上以两种稍微不同的方式定义了相同的生成模型。上述程序的输出如下:
deepee@entropy:~$ ./test_inference.py
Applied log-transform to tau and added transformed tau_log to model.
Applied log-transform to tau and added transformed tau_log to model.
pymc3 version 3.0
Model 1 tau
Mean: 2.5
Standard Deviation: 1.6
Median 2.1
Model 2 tau
Mean: 4.0
Standard Deviation: 2.5
Median 3.4
Model 2 Metropolis tau
Mean: 3.5
Standard Deviation: 2.3
Median 2.9
tau的真实后验平均值为3.5,标准差为2.3,中值为3.0,与Metropolis一致。使用Stan,这些值也更紧密地匹配。我正在使用pymc3的一个相对较新的提交(ca40cd3b2)。