Question

假设我有一组随机的X，Y分：

x = np.array(range(0,50))
y = np.random.uniform(low=0.0, high=40.0, size=200)
y = map((lambda a: a[0] + a[1]), zip(x,y))
plt.scatter(x,y)

enter image description here

假设我使用线性回归为y的每个值建模x为高斯，我如何估算posterior predictive，即 p(y|x)的每个（可能）值x ？

使用pymc或scikit-learn

是否可以直接执行此操作？

Answer 1

如果我理解你想要什么，你可以使用git版本的PyMC（PyMC3）和glm子模块来做到这一点。例如

import numpy as np
import pymc as pm
import matplotlib.pyplot as plt 
from pymc import glm 

## Make some data
x = np.array(range(0,50))
y = np.random.uniform(low=0.0, high=40.0, size=50)
y = 2*x+y
## plt.scatter(x,y)

data = dict(x=x, y=y)
with pm.Model() as model:
    # specify glm and pass in data. The resulting linear model, its likelihood and 
    # and all its parameters are automatically added to our model.
    pm.glm.glm('y ~ x', data)
    step = pm.NUTS() # Instantiate MCMC sampling algorithm
    trace = pm.sample(2000, step)


##fig = pm.traceplot(trace, lines={'alpha': 1, 'beta': 2, 'sigma': .5});## traces
fig = plt.figure()
ax = fig.add_subplot(111)
plt.scatter(x, y, label='data')
glm.plot_posterior_predictive(trace, samples=50, eval=x,
                              label='posterior predictive regression lines')

获得类似 posterior predictive

的内容

你应该会发现这些博客文章很有趣： <{3}}和1我从中获取了这些想法。

修改为了获得每个x的y值，请尝试将其从挖掘到glm源中获得。

lm = lambda x, sample: sample['Intercept'] + sample['x'] * x ## linear model samples=50 ## Choose to be the same as in plot call trace_det = np.empty([samples, len(x)]) ## initialise for i, rand_loc in enumerate(np.random.randint(0, len(trace), samples)): rand_sample = trace[rand_loc] trace_det[i] = lm(x, rand_sample) y = trace_det.T y[0]

道歉，如果它不是最优雅的 - 希望你能遵循逻辑。

估计回归中的后验预测

1 个答案: