在PyMC3中无法使用sample_ppc_w获得加权后验预测样本

时间:2018-07-05 13:06:42

标签: pymc3

我希望使用多个模型的加权平均值来预测看不见的数据。
当要平均的模型包含取决于Theano共享变量的确定性时,我无法使用 sample_ppc_w

简化示例:
(笔记本here中更完整的示例)

  

a b 之间的差异进行建模,以预测新的 a b 的HPD范围。 strong>可用。

数据

df = pd.DataFrame(dict(a=np.random.normal(0.1, 1.5, 1000), 
                       b=np.random.normal(-0.05, 1.2, 1000)))
df['diff'] = df['a'] - df['b']

notebook中显示了我遵循的平均方法。
使用计算出的 diff 观察值时,我能够成功地对多个模型进行平均。

模型1

with pm.Model() as model_t1:

    mu = pm.Uniform('mu', lower=-5, upper=5)
    sd = pm.Uniform('sd', lower=0, upper=10)
    nu = pm.Uniform('nu', lower=0, upper=10)

    y = pm.StudentT('y', mu=mu, nu=nu, sd=sd, observed=df['diff'])

    trace_t1 = pm.sample(20000, tune=1000, njobs=2)
    btrace_t1 = trace_t1[1000:]

模型2

with pm.Model() as model_l1:

    mu = pm.Uniform('mu', lower=-5, upper=5)
    b = pm.Uniform('b', lower=0, upper=10)  

    y = pm.Laplace('y', mu=mu, b=b, observed=df['diff'])

    trace_l1 = pm.sample(20000, tune=1000, njobs=2)
    btrace_l1 = trace_l1[1000:]

比较

models_dict1 = {
    model_t1: btrace_t1,
    model_l1: btrace_l1,
}
comparison = pm.compare(models_dict1, method='stacking', ic='WAIC')

加权平均值中的采样

traces1 = [btrace_t1, btrace_l1]
models1 = [model_t1, model_l1]
ppc_w1 = pm.sample_ppc_w(traces1, models=models1, weights=comparison1w.weight.sort_index(ascending=True))

加权采样与观测df ['diff']的关系图 Plot of Weighted sampling vs Observed

但是我希望对看不见的数据进行预测。
我遵循的预测方法是将Theano共享变量用于 a b 并使用确定性变量对 diff 进行建模,如{{3 }}。

我可以成功地使用此方法通过单个模型对看不见的数据进行预测:

使用模型1进行预测

a_shared_t = tt.shared(df['a'].values)
b_shared_t = tt.shared(df['b'].values)
with pm.Model() as model_t2:
    arv = pm.Normal('arv', mu=df['a'].mean(), sd=df['a'].std(), observed=a_shared_t)
    brv = pm.Normal('brv', mu=df['b'].mean(), sd=df['b'].std(), observed=b_shared_t)

    diff = pm.Deterministic('diff', arv - brv)

    mu = pm.Uniform('mu', lower=-5, upper=5)
    sd = pm.Uniform('sd', lower=0, upper=10)
    nu = pm.Uniform('nu', lower=0, upper=10)

    y = pm.StudentT('y', mu=mu, nu=nu, sd=sd, observed=diff)

    trace_t2 = pm.sample(20000, tune=1000, njobs=2)
    btrace_t2 = trace_t2[1000:]

observed_a = 1.8
a_shared_t.set_value(np.append(df['a'].values, observed_a))
b_shared_t.set_value(np.append(df['a'].values, 0.))
ppc_t2_updated = pm.sample_ppc(btrace_t2, 
                               samples=10000, 
                               model=model_t2, 
                               size=100)
sample = ppc_t2_updated['y'][:, -1]
hpd_80_diff = pm.stats.hpd(sample, alpha=0.2)
predicted_b_range = observed_a - hpd_80_diff[1], observed_a - hpd_80_diff[0]
== (-0.7764686082453378, 3.890428414268656)

使用模型2进行预测

a_shared_l = tt.shared(df['a'].values)
b_shared_l = tt.shared(df['b'].values)
with pm.Model() as model_l2:
    arv = pm.Normal('arv', mu=df['a'].mean(), sd=df['a'].std(), observed=a_shared_l)
    brv = pm.Normal('brv', mu=df['b'].mean(), sd=df['b'].std(), observed=b_shared_l)

    diff = pm.Deterministic('diff', arv - brv)

    mu = pm.Uniform('mu', lower=-5, upper=5)
    b = pm.Uniform('b', lower=0, upper=10)  

    y = pm.Laplace('y', mu=mu, b=b, observed=diff)

    trace_l2 = pm.sample(20000, tune=1000, njobs=2)
    btrace_l2 = trace_l2[1000:]

observed_a = 1.8
dummy = 0.
a_shared_l.set_value(np.append(df['a'].values, observed_a))
b_shared_l.set_value(np.append(df['a'].values, dummy))
ppc_l2_updated = pm.sample_ppc(btrace_l2, 
                               samples=10000, 
                               model=model_l2, 
                               size=100)
sample = ppc_l2_updated['y'][:, -1]
hpd_80_diff = pm.stats.hpd(sample, alpha=0.2)
predicted_b_range = observed_a - hpd_80_diff[1], observed_a - hpd_80_diff[0]
== (-0.8231007149866174, 3.9518915820568434)

但是,当尝试使用 pm.sample_ppc_w 从平均模型中采样时:

traces2 = [btrace_t2, btrace_l2]
models2 = [model_t2, model_l2]
ppc_w2 = pm.sample_ppc_w(traces2, models=models2)

我收到以下错误:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/.pyenv/versions/3.6.4/envs/ganesha/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
     51     try:
---> 52         return getattr(obj, method)(*args, **kwds)
     53 

AttributeError: 'list' object has no attribute 'repeat'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-197-2cc68ceb6ac2> in <module>()
      1 traces2 = [btrace_t2, btrace_l2]
      2 models2 = [model_t2, model_l2]
----> 3 ppc_w2 = pm.sample_ppc_w(traces2, models=models2)

~/.pyenv/versions/3.6.4/envs/ganesha/lib/python3.6/site-packages/pymc3/sampling.py in sample_ppc_w(traces, samples, models, weights, random_seed, progressbar)
   1160 
   1161     obs = [x for m in models for x in m.observed_RVs]
-> 1162     variables = np.repeat(obs, n)
   1163 
   1164     lengths = list(set([np.atleast_1d(observed).shape for observed in obs]))

~/.pyenv/versions/3.6.4/envs/ganesha/lib/python3.6/site-packages/numpy/core/fromnumeric.py in repeat(a, repeats, axis)
    421 
    422     """
--> 423     return _wrapfunc(a, 'repeat', repeats, axis=axis)
    424 
    425 

~/.pyenv/versions/3.6.4/envs/ganesha/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
     60     # a downstream library like 'pandas'.
     61     except (AttributeError, TypeError):
---> 62         return _wrapit(obj, method, *args, **kwds)
     63 
     64 

~/.pyenv/versions/3.6.4/envs/ganesha/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapit(obj, method, *args, **kwds)
     40     except AttributeError:
     41         wrap = None
---> 42     result = getattr(asarray(obj), method)(*args, **kwds)
     43     if wrap:
     44         if not isinstance(result, mu.ndarray):

ValueError: operands could not be broadcast together with shape (6,) (2,)

看来这可能是由于跟踪变量的数量不匹配造成的,但是对于我来说,我不知道该如何解决。

在对包含确定性模型的模型进行平均时,是否可以使用 sample_ppc_w

非常感谢

0 个答案:

没有答案