Question

我正在尝试使用numpy.random.multivariate_normal生成多个样本，其中每个样本都来自具有不同mean和cov的多变量Normal分布。例如，如果我想绘制2个样本，我试过

from numpy import random as rand

means = np.array([[-1., 0.], [1., 0.]])
covs = np.array([np.identity(2) for k in xrange(2)]) 
rand.multivariate_normal(means, covs)

但这导致ValueError: mean must be 1 dimensional。我必须为此循环吗？我认为对于像rand.binomial这样的函数来说这是可能的。

Answer 1

正如@hpaulj建议的那样，您可以从标准的多元正态分布中生成样本，然后使用einsum和/或广播来转换样本。通过将标准样本点乘以协方差矩阵的平方根来完成缩放。在下文中，我使用scipy.linalg.sqrtm计算矩阵平方根，使用numpy.einsum进行矩阵乘法。

import numpy as np
from scipy.linalg import sqrtm
import matplotlib.pyplot as plt


# Sequence of means
means = np.array([[-15., 0.], [15., 0.], [0., 0.]])
# Sequence of covariance matrices.  Must be the same length as means.
covs = np.array([[[ 3, -1],
                  [-1,  2]],
                 [[ 1,  2],
                  [ 2,  5]],
                 [[ 1,  0],
                  [ 0,  1]]])
# Number of samples to generate for each (mean, cov) pair.
nsamples = 4000

# Compute the matrix square root of each covariance matrix.
sqrtcovs = np.array([sqrtm(c) for c in covs])

# Generate samples from the standard multivariate normal distribution.
dim = len(means[0])
u = np.random.multivariate_normal(np.zeros(dim), np.eye(dim),
                                  size=(len(means), nsamples,))
# u has shape (len(means), nsamples, dim)

# Transform u.
v = np.einsum('ijk,ikl->ijl', u, sqrtcovs)
m = np.expand_dims(means, 1)
t = v + m

# t also has shape (len(means), nsamples, dim).
# t[i] holds the nsamples sampled from the distribution with mean means[i]
# and covariance cov[i].

plt.subplot(2, 1, 1)
plt.plot(t[...,0].ravel(), t[...,1].ravel(), '.', alpha=0.02)
plt.axis('equal')
plt.xlim(-25, 25)
plt.ylim(-8, 8)
plt.grid()

# Make another plot, where we generate the samples by passing the given
# means and covs to np.random.multivariate_normal.  This plot should look
# the same as the first plot.
plt.subplot(2, 1, 2)
p0 = np.random.multivariate_normal(means[0], covs[0], size=nsamples)
p1 = np.random.multivariate_normal(means[1], covs[1], size=nsamples)
p2 = np.random.multivariate_normal(means[2], covs[2], size=nsamples)

plt.plot(p0[:,0], p0[:,1], 'b.', alpha=0.02)
plt.plot(p1[:,0], p1[:,1], 'g.', alpha=0.02)
plt.plot(p2[:,0], p2[:,1], 'r.', alpha=0.02)
plt.axis('equal')
plt.xlim(-25, 25)
plt.ylim(-8, 8)
plt.grid()

此方法可能不会更快地循环遍历means和covs数组并为每对调用multivariate_normal一次（均值，cov）。这种方法可以带来最大好处的情况是，当你有许多不同的均值和协方差并且每对产生少量样本时。即便如此，它可能也不会更快，因为脚本在covs数组上使用Python循环来为每个协方差矩阵调用sqrtm。如果性能至关重要，请使用实际数据进行测试。

`numpy.random.multivariate_normal`的矢量化实现

1 个答案: