具有所有向量的数据

Question

是否有任何python包允许有效计算多变量普通pdf？

似乎没有包含在Numpy / Scipy中，并且令人惊讶的是谷歌搜索没有发现任何有用的东西。

Answer 1

SciPy 0.14.0.dev-16fc0af现在可以使用多变量法线：

from scipy.stats import multivariate_normal
var = multivariate_normal(mean=[0,0], cov=[[1,0],[0,1]])
var.pdf([1,0])

Answer 2

我刚为我的目的做了一个，所以我虽然分享。它使用numpy的“权力”构建，来自http://en.wikipedia.org/wiki/Multivariate_normal_distribution的非退化情况的公式，并且它验证了输入。

以下是代码和示例运行

from numpy import *
import math
# covariance matrix
sigma = matrix([[2.3, 0, 0, 0],
           [0, 1.5, 0, 0],
           [0, 0, 1.7, 0],
           [0, 0,   0, 2]
          ])
# mean vector
mu = array([2,3,8,10])

# input
x = array([2.1,3.5,8, 9.5])

def norm_pdf_multivariate(x, mu, sigma):
    size = len(x)
    if size == len(mu) and (size, size) == sigma.shape:
        det = linalg.det(sigma)
        if det == 0:
            raise NameError("The covariance matrix can't be singular")

        norm_const = 1.0/ ( math.pow((2*pi),float(size)/2) * math.pow(det,1.0/2) )
        x_mu = matrix(x - mu)
        inv = sigma.I        
        result = math.pow(math.e, -0.5 * (x_mu * inv * x_mu.T))
        return norm_const * result
    else:
        raise NameError("The dimensions of the input don't match")

print norm_pdf_multivariate(x, mu, sigma)

Answer 3

在对角协方差矩阵的常见情况下，可以通过简单地乘以scipy.stats.norm实例返回的单变量PDF值来获得多变量PDF。如果您需要一般情况，您可能需要自己编写代码（这应该不难）。

Answer 4

如果仍然需要，我的实施将是

import numpy as np

def pdf_multivariate_gauss(x, mu, cov):
    '''
    Caculate the multivariate normal density (pdf)

    Keyword arguments:
        x = numpy array of a "d x 1" sample vector
        mu = numpy array of a "d x 1" mean vector
        cov = "numpy array of a d x d" covariance matrix
    '''
    assert(mu.shape[0] > mu.shape[1]), 'mu must be a row vector'
    assert(x.shape[0] > x.shape[1]), 'x must be a row vector'
    assert(cov.shape[0] == cov.shape[1]), 'covariance matrix must be square'
    assert(mu.shape[0] == cov.shape[0]), 'cov_mat and mu_vec must have the same dimensions'
    assert(mu.shape[0] == x.shape[0]), 'mu and x must have the same dimensions'
    part1 = 1 / ( ((2* np.pi)**(len(mu)/2)) * (np.linalg.det(cov)**(1/2)) )
    part2 = (-1/2) * ((x-mu).T.dot(np.linalg.inv(cov))).dot((x-mu))
    return float(part1 * np.exp(part2))

def test_gauss_pdf():
    x = np.array([[0],[0]])
    mu  = np.array([[0],[0]])
    cov = np.eye(2) 

    print(pdf_multivariate_gauss(x, mu, cov))

    # prints 0.15915494309189535

if __name__ == '__main__':
    test_gauss_pdf()

如果我将来进行更改，则代码为here on GitHub

Answer 5

我知道几个在内部使用它的python包，具有不同的通用性和不同用途，但我不知道它们是否适用于用户。

例如，

statsmodels具有以下隐藏功能和类，但statsmodels不使用它：

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/miscmodels/try_mlecov.py#L36

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/distributions/mv_normal.py#L777

基本上，如果您需要快速评估，请根据您的用例重写它。

Answer 6

我使用以下代码计算logpdf值，这对于较大的维度是优选的。它也适用于scipy.sparse矩阵。

import numpy as np
import math
import scipy.sparse as sp
import scipy.sparse.linalg as spln

def lognormpdf(x,mu,S):
    """ Calculate gaussian probability density of x, when x ~ N(mu,sigma) """
    nx = len(S)
    norm_coeff = nx*math.log(2*math.pi)+np.linalg.slogdet(S)[1]

    err = x-mu
    if (sp.issparse(S)):
        numerator = spln.spsolve(S, err).T.dot(err)
    else:
        numerator = np.linalg.solve(S, err).T.dot(err)

    return -0.5*(norm_coeff+numerator)

代码来自pyParticleEst，如果你想要pdf值而不是logpdf，只需对返回的值采用math.exp（）

Answer 7

密度可以使用numpy函数和此页面上的公式以非常简单的方式计算：http://en.wikipedia.org/wiki/Multivariate_normal_distribution。您可能还希望使用似然函数（对数概率），这对于大尺寸而言不太可能下溢，并且计算起来更简单一些。两者都只涉及能够计算矩阵的行列式和逆矩阵。

另一方面，CDF是一种完全不同的动物......

Answer 8

您可以使用numpy轻松进行计算。我为实现机器学习课程而实施了以下内容，并希望与大家分享，希望对您有所帮助。

import numpy as np
X = np.array([[13.04681517, 14.74115241],[13.40852019, 13.7632696 ],[14.19591481, 15.85318113],[14.91470077, 16.17425987]])

def est_gaus_par(X):
    mu = np.mean(X,axis=0)
    sig = np.std(X,axis=0)
    return mu,sig

mu,sigma = est_gaus_par(X)

def est_mult_gaus(X,mu,sigma):
    m = len(mu)
    sigma2 = np.diag(sigma)
    X = X-mu.T
    p = 1/((2*np.pi)**(m/2)*np.linalg.det(sigma2)**(0.5))*np.exp(-0.5*np.sum(X.dot(np.linalg.pinv(sigma2))*X,axis=1))

    return p

p = est_mult_gaus(X, mu, sigma)

Answer 9

下面的代码帮助我解决了：给定向量时，向量处于多元正态分布中的可能性是什么。

import numpy as np
from scipy.stats import multivariate_normal

具有所有向量的数据

d= np.array([[1,2,1],[2,1,3],[4,5,4],[2,2,1]])

矢量形式的数据均值，其长度与输入矢量相同（此处为3）

mean = sum(d,axis=0)/len(d)

OR
mean=np.average(d , axis=0)
mean.shape

寻找将具有[输入向量形状X输入向量形状]形状的向量的协变数，这里是3x3

cov = 0
for e in d:
  cov += np.dot((e-mean).reshape(len(e), 1), (e-mean).reshape(1, len(e)))
cov /= len(d)
cov.shape

根据均值和协方差准备多元高斯分布

dist = multivariate_normal(mean,cov)

发现概率分布函数。

print(dist.pdf([1,2,3]))

3.050863384798471e-05

以上数值给出了可能性。

Answer 10

这里我详细阐述了如何使用 scipy 包中的 multivariate_normal()：

# Import packages
import numpy as np
from scipy.stats import multivariate_normal

# Prepare your data
x = np.linspace(-10, 10, 500)
y = np.linspace(-10, 10, 500)
X, Y = np.meshgrid(x,y)

# Get the multivariate normal distribution
mu_x = np.mean(x)
sigma_x = np.std(x)
mu_y = np.mean(y)
sigma_y = np.std(y)
rv = multivariate_normal([mu_x, mu_y], [[sigma_x, 0], [0, sigma_y]])

# Get the probability density
pos = np.empty(X.shape + (2,))
pos[:, :, 0] = X
pos[:, :, 1] = Y
pd = rv.pdf(pos)

Python中的多元正常密度？

10 个答案:

具有所有向量的数据

矢量形式的数据均值，其长度与输入矢量相同（此处为3）

寻找将具有[输入向量形状X输入向量形状]形状的向量的协变数，这里是3x3

根据均值和协方差准备多元高斯分布

发现概率分布函数。