在PyMC3中对序数预测变量建模时避免优化陷阱

时间:2016-11-04 16:36:10

标签: bayesian pymc pymc3

我试图根据做贝叶斯数据分析第23章中的方法,使用PyMC3对序数预测变量进行建模。我想使用find_MAP确定一个很好的起始值,但是我收到了一个优化错误。

模特:

import pymc3 as pm
import numpy as np
import theano
import theano.tensor as tt

# Some helper functions
def cdf(x, location=0, scale=1):
    epsilon = np.array(1e-32, dtype=theano.config.floatX)

    location = tt.cast(location, theano.config.floatX)
    scale = tt.cast(scale, theano.config.floatX)

    div = tt.sqrt(2 * scale ** 2 + epsilon)
    div = tt.cast(div, theano.config.floatX)

    erf_arg = (x - location) / div
    return .5 * (1 + tt.erf(erf_arg + epsilon))


def percent_to_thresh(idx, vect):
    return 5 * tt.sum(vect[:idx + 1]) + 1.5


def full_thresh(thresh):
    idxs = tt.arange(thresh.shape[0] - 1)
    thresh_mod, updates = theano.scan(fn=percent_to_thresh,
                                      sequences=[idxs],
                                      non_sequences=[thresh])
    return tt.concatenate([[-1 * np.inf, 1.5], thresh_mod, [6.5, np.inf]])


def compute_ps(thresh, location, scale):
    f_thresh = full_thresh(thresh)
    return cdf(f_thresh[1:], location, scale) - cdf(f_thresh[:-1], location, scale)

# Generate data
real_ps = [0.05, 0.05, 0.1, 0.1, 0.2, 0.3, 0.2]
data = np.random.choice(7, size=1000, p=real_ps)

# Run model
with pm.Model() as model:
    mu = pm.Normal('mu', mu=4, sd=3)
    sigma = pm.Uniform('sigma', lower=0.1, upper=70)
    thresh = pm.Dirichlet('thresh', a=np.ones(5))

    cat_p = compute_ps(thresh, mu, sigma)

    results = pm.Categorical('results', p=cat_p, observed=data)

with model:
    start = pm.find_MAP()
    trace = pm.sample(2000, start=start)

运行时,我收到以下错误:

    Applied interval-transform to sigma and added transformed sigma_interval_ to model.
Applied stickbreaking-transform to thresh and added transformed thresh_stickbreaking_ to model.
Traceback (most recent call last):
  File "cm_net_log.v1-for_so.py", line 53, in <module>
    start = pm.find_MAP()
  File "/usr/local/lib/python3.5/site-packages/pymc3/tuning/starting.py", line 133, in find_MAP
    specific_errors)
ValueError: Optimization error: max, logp or dlogp at max have non-finite values. Some values may be outside of distribution support. max: {'thresh_stickbreaking_': array([-1.04298465, -0.48661088, -0.84326554, -0.44833646]), 'sigma_interval_': array(-2.220446049250313e-16), 'mu': array(7.68422528308479)} logp: array(-3506.530143064723) dlogp: array([  1.61013190e-06,              nan,  -6.73994118e-06,
        -6.93873894e-06,   6.03358122e-06,   3.18954680e-06])Check that 1) you don't have hierarchical parameters, these will lead to points with infinite density. 2) your distribution logp's are properly specified. Specific issues:

我的问题:

  1. 如何确定为什么dlogp在某些点上是nan?
  2. 我是否可以采用不同的方式表达此模型以避免dlogp为nan?
  3. 还值得注意:

    • 如果我没有找到_MAP并使用Metropolis采样器,则此模型运行正常。但是,我希望能够灵活地使用其他采样器,因为这个模型变得更加复杂。
    • 我怀疑这个问题是由于阈值和正态分布之间的关系造成的,但我不知道如何解开它们以进行优化。

1 个答案:

答案 0 :(得分:0)

关于问题2:我用不同的方式表达了序数预测变量(单一组)的模型;我使用Theano @as_op装饰器来计算结果概率的函数。这也解释了为什么我不能使用find_MAP()或基于渐变的采样器:Theano无法计算自定义函数的渐变。 (http://pymc-devs.github.io/pymc3/notebooks/getting_started.html#Arbitrary-deterministics

# Number of outcomes
nYlevels = df.Y.cat.categories.size

thresh = [k + .5 for k in range(1, nYlevels)]
thresh_obs = np.ma.asarray(thresh)
thresh_obs[1:-1] = np.ma.masked

@as_op(itypes=[tt.dvector, tt.dscalar, tt.dscalar], otypes=[tt.dvector])
def outcome_probabilities(theta, mu, sigma):
    out = np.empty(nYlevels)
    n = norm(loc=mu, scale=sigma)       
    out[0] = n.cdf(theta[0])        
    out[1] = np.max([0, n.cdf(theta[1]) - n.cdf(theta[0])])
    out[2] = np.max([0, n.cdf(theta[2]) - n.cdf(theta[1])])
    out[3] = np.max([0, n.cdf(theta[3]) - n.cdf(theta[2])])
    out[4] = np.max([0, n.cdf(theta[4]) - n.cdf(theta[3])])
    out[5] = np.max([0, n.cdf(theta[5]) - n.cdf(theta[4])])
    out[6] = 1 - n.cdf(theta[5])
    return out

with pm.Model() as ordinal_model_single:    

    theta = pm.Normal('theta', mu=thresh, tau=np.repeat(.5**2, len(thresh)),
                      shape=len(thresh), observed=thresh_obs, testval=thresh[1:-1])

    mu = pm.Normal('mu', mu=nYlevels/2.0, tau=1.0/(nYlevels**2))
    sigma = pm.Uniform('sigma', nYlevels/1000.0, nYlevels*10.0)

    pr = outcome_probabilities(theta, mu, sigma)

    y = pm.Categorical('y', pr, observed=df.Y.cat.codes.as_matrix())

http://nbviewer.jupyter.org/github/JWarmenhoven/DBDA-python/blob/master/Notebooks/Chapter%2023.ipynb