Question

给定一些参数Θ的后验p（Θ| D），可以define以下：

最高后部密度区域：

最高后部密度区域是Θ的最可能值的集合，其总共构成后部质量的100（1-α）％。

换句话说，对于给定的α，我们寻找满足以下条件的 p *：

enter image description here

然后获得最高后部密度区域作为集合：

enter image description here

中央可信区域：

使用与上述相同的表示法，可信区域（或间隔）定义为：

enter image description here

根据分布情况，可能会有许多此类间隔。 中心可信区间被定义为可信区间，（1-α）/ 2 质量每条尾。

计算：

对于一般分布，从分布中给出样本，是否有任何内置函数用于获取Python中的两个数量或 PyMC ？
对于常见的参数分布（例如Beta，Gaussian等），是否有任何内置函数或库可以使用SciPy或statsmodels进行计算？

Answer 1

从我的理解＆＃34;中心可信区域＆＃34;与置信区间的计算方式没有任何不同;您只需要cdf和alpha/2处的1-alpha/2函数的反函数;在scipy中，这称为ppf（百分点函数）;因此对于高斯后验分布：

>>> from scipy.stats import norm
>>> alpha = .05
>>> l, u = norm.ppf(alpha / 2), norm.ppf(1 - alpha / 2)

验证[l, u]涵盖后密度(1-alpha)：

>>> norm.cdf(u) - norm.cdf(l)
0.94999999999999996

类似于Beta后验，例如a=1和b=3：

>>> from scipy.stats import beta
>>> l, u = beta.ppf(alpha / 2, a=1, b=3), beta.ppf(1 - alpha / 2, a=1, b=3)

再次：

>>> beta.cdf(u, a=1, b=3) - beta.cdf(l, a=1, b=3)
0.94999999999999996

here你可以看到scipy中包含的参数分布;我想他们都有ppf函数;

至于最高后部密度区域，它更棘手，因为pdf函数不一定是可逆的;一般来说，这样的地区甚至可能没有连接;例如，对于带有a = b = .5的Beta（可以看作here）;

但是，在高斯分布的情况下，很容易看出＆＃34;最高后验密度区域＆＃34;恰逢中央可信区域＆＃34 ;;我认为所有对称单模态分布都是如此（即如果pdf函数在分布模式周围是对称的）

对于一般情况，可能的数值方法是使用p* pdf对p*的值进行二元搜索;利用积分是def mix_norm_pdf(x, loc, scale, weight): from scipy.stats import norm return np.dot(weight, norm.pdf(x, loc, scale));

的单调函数的事实

以下是混合高斯的例子：

[1] 您需要的第一件事是分析pdf函数;混合高斯很容易：

loc    = np.array([-1, 3])   # mean values
scale  = np.array([.5, .8])  # standard deviations
weight = np.array([.4, .6])  # mixture probabilities

所以例如位置，比例和重量值，如

p*

你会得到两个漂亮的高斯分布：

enter image description here

[2] 现在，你需要一个错误函数，它给出了p*的测试值，将pdf函数集成在1 - alpha之上，并从期望值{{返回平方误差1}}：

def errfn( p, alpha, *args):
    from scipy import integrate
    def fn( x ):
        pdf = mix_norm_pdf(x, *args)
        return pdf if pdf > p else 0

    # ideally integration limits should not
    # be hard coded but inferred
    lb, ub = -3, 6 
    prob = integrate.quad(fn, lb, ub)[0]
    return (prob + alpha - 1.0)**2

[3] 现在，对于给定的alpha值，我们可以最小化错误函数以获取p*：

alpha = .05

from scipy.optimize import fmin
p = fmin(errfn, x0=0, args=(alpha, loc, scale, weight))[0]

导致p* = 0.0450，HPD如下;红色区域代表分布的1 - alpha，水平虚线代表p*。

enter image description here

Answer 2

PyMC具有用于计算hpd的内置函数。在v2.3中它是在utils中。查看来源here。作为线性模型的一个例子，它是HPD

import pymc as pc  
import numpy as np
import matplotlib.pyplot as plt 
## data
np.random.seed(1)
x = np.array(range(0,50))
y = np.random.uniform(low=0.0, high=40.0, size=50)
y = 2*x+y
## plt.scatter(x,y)

## priors
emm = pc.Uniform('m', -100.0, 100.0, value=0)
cee = pc.Uniform('c', -100.0, 100.0, value=0) 

#linear-model
@pc.deterministic(plot=False)
def lin_mod(x=x, cee=cee, emm=emm):
    return emm*x + cee 

#likelihood
llhy = pc.Normal('y', mu=lin_mod, tau=1.0/(10.0**2), value=y, observed=True)

linearModel = pc.Model( [llhy, lin_mod, emm, cee] )
MCMClinear = pc.MCMC( linearModel)
MCMClinear.sample(10000,burn=5000,thin=5)
linear_output=MCMClinear.stats()

## pc.Matplot.plot(MCMClinear)
## print HPD using the trace of each parameter 
print(pc.utils.hpd(MCMClinear.trace('m')[:] , 1.- 0.95))
print(pc.utils.hpd(MCMClinear.trace('c')[:] , 1.- 0.95))

您也可以考虑计算分位数

print(linear_output['m']['quantiles'])
print(linear_output['c']['quantiles'])

我认为如果你只需要2.5％到97.5％的值就可以获得95％的中心可信区间。

Answer 3

另一个选项（改编自R到Python）和摘自John K. Kruschke的“做贝叶斯数据分析”一书如下：

from scipy.optimize import fmin
from scipy.stats import *

def HDIofICDF(dist_name, credMass=0.95, **args):
    # freeze distribution with given arguments
    distri = dist_name(**args)
    # initial guess for HDIlowTailPr
    incredMass =  1.0 - credMass

    def intervalWidth(lowTailPr):
        return distri.ppf(credMass + lowTailPr) - distri.ppf(lowTailPr)

    # find lowTailPr that minimizes intervalWidth
    HDIlowTailPr = fmin(intervalWidth, incredMass, ftol=1e-8, disp=False)[0]
    # return interval as array([low, high])
    return distri.ppf([HDIlowTailPr, credMass + HDIlowTailPr])

我们的想法是创建一个函数 intervalWidth ，它返回间隔的宽度从lowTailPr开始并具有 credMass 质量。 intervalWidth函数的最小值是使用scipy中的fmin最小化器建立的。

例如：

的结果

print HDIofICDF(norm, credMass=0.95, loc=0, scale=1)

是

    [-1.95996398  1.95996398]

传递给HDIofICDF的分发参数的名称必须与scipy中使用的完全相同。

Answer 4

要计算HPD，你可以利用pymc3，这是一个例子

import pymc3
from scipy.stats import norm
a = norm.rvs(size=10000)
pymc3.stats.hpd(a)

Answer 5

我偶然发现了这篇文章，试图找到一种从MCMC样本中估算HDI的方法，但没有一个答案对我有用。像aloctavodia一样，我在“做贝叶斯数据分析”一书中改编了一个R例子。我需要从MCMC样本计算95％的HDI。这是我的解决方案：

import numpy as np
def HDI_from_MCMC(posterior_samples, credible_mass):
    # Computes highest density interval from a sample of representative values,
    # estimated as the shortest credible interval
    # Takes Arguments posterior_samples (samples from posterior) and credible mass (normally .95)
    sorted_points = sorted(posterior_samples)
    ciIdxInc = np.ceil(credible_mass * len(sorted_points)).astype('int')
    nCIs = len(sorted_points) - ciIdxInc
    ciWidth = [0]*nCIs
    for i in range(0, nCIs):
    ciWidth[i] = sorted_points[i + ciIdxInc] - sorted_points[i]
    HDImin = sorted_points[ciWidth.index(min(ciWidth))]
    HDImax = sorted_points[ciWidth.index(min(ciWidth))+ciIdxInc]
    return(HDImin, HDImax)

上面的方法根据我的数据给出了逻辑答案！

Answer 6

您可以通过两种方式获取中央可信区间：图形上，当您在模型中的变量上调用summary_plot时，默认情况下会有一个bpd标记设置为True 。将其更改为False将绘制中心间隔。您可以获得它的第二个位置是在模型或节点上调用summary方法时;它将为您提供后验分位数，默认情况下外部分位数为95％（您可以使用alpha参数进行更改）。

Answer 7

在`R`中，您可以使用`stat.extend`软件包

如果您正在处理标准参数分布，并且不介意使用R，则可以使用stat.extend package中的HDR函数。该软件包对所有基本发行版和扩展软件包中的某些发行版均具有HDR功能。它使用分位数功能为分布计算HDR，并自动调整分布形状（例如，单峰，双峰等）。这是使用此软件包为标准参数分布计算的HDR的一些示例。

#Load library
library(stat.extend)

#---------------------------------------------------------------
#Compute HDR for gamma distribution
HDR.gamma(cover.prob = 0.9, shape = 3, scale = 4)

        Highest Density Region (HDR) 
 
90.00% HDR for gamma distribution with shape = 3 and scale = 4 
Computed using nlm optimisation with 6 iterations (code = 1) 

[1.76530758147504, 21.9166988492762]

#---------------------------------------------------------------
#Compute HDR for (unimodal) beta distribution
HDR.beta(cover.prob = 0.9, shape1 = 3.2, shape2 = 3.0)

        Highest Density Region (HDR) 
 
90.00% HDR for beta distribution with shape1 = 3.2 and shape2 = 3 
Computed using nlm optimisation with 4 iterations (code = 1) 

[0.211049233508331, 0.823554556452285]

#---------------------------------------------------------------
#Compute HDR for (bimodal) beta distribution
HDR.beta(cover.prob = 0.9, shape1 = 0.3, shape2 = 0.4)

        Highest Density Region (HDR) 
 
90.00% HDR for beta distribution with shape1 = 0.3 and shape2 = 0.4 
Computed using nlm optimisation with 6 iterations (code = 1) 

[0, 0.434124342324438] U [0.640580807770818, 1]

最高后密度区和中心可信区

最高后部密度区域：

中央可信区域：

计算：

7 个答案:

在`R`中，您可以使用`stat.extend`软件包

最高后密度区和中心可信区

最高后部密度区域：

中央可信区域：

计算：

7 个答案:

在R中，您可以使用stat.extend软件包

在`R`中，您可以使用`stat.extend`软件包