保存mcmc结果以便稍后在pymc3中保存

时间:2017-10-09 21:54:24

标签: pickle pymc3

这个问题与精彩的pymc3模块有关。我想将mcmc结果保存到磁盘,以便稍后,当新数据进入时,我可以使用sample_ppc而无需再次训练。以下是从PyMC3's documentation on posterior checks借用的一些代码:

from theano import shared
import numpy as np
import pymc3 as pm

def learn():
    def invlogit(x):        
      return np.exp(x) / (1 + np.exp(x))

    coeff = 1
    predictors = np.random.normal(size=1e6)

    predictors_shared = shared(predictors)
    outcomes = np.random.binomial(1, invlogit(coeff * predictors))

    def tinvlogit(x):
        import theano.tensor as t
        return t.exp(x) / (1 + t.exp(x))

    with pm.Model() as model:
        coeff = pm.Normal('coeff', mu=0, sd=1)
        p = tinvlogit(coeff * predictors_shared)
        o = pm.Bernoulli('o', p, observed=outcomes)
        trace = pm.sample(5000, n_init=5000)

    # reduce the shared variable memory requirement
    predictors_shared.set_value(np.zeros(1))

    return {'trace': trace, 'model': model, 'predictors_shared': predictors_shared}

def predict(trace, model, predictors_shared):
    predictors_oos = np.random.normal(size=50)
    predictors_shared.set_value(predictors_oos)
    return pm.sample_ppc(trace, model=model, samples=500)

首先我们学习:

import pickle
learned_result = learn()
with open('some/file.pkl', 'wb') as f:
    pickle.dump(learned_result, f)

然后我们对新数据进行unpickle并做出预测:

with open('some/file.pkl', 'rb') as f:
    learned_result = pickle.load(learned_result, f)
ppc = predict(**learned_result)

除了存储问题之外,这种方法很有效 - 腌制的learned_result非常庞大。杀手是model。根据相对大小来判断,我认为model在内部存储了整个训练数据集。有没有办法从model对象中删除内部存储的数据?如果我这样做,我的sample_ppc仍然有用吗?是否有一些理论上的理由为什么model必须在整个训练数据集中保留以便进行后验预测检查?提前感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

保存轨迹而不是模型,则可以构建相同的模型并使用结果,而无需再次运行采样器。