我正在尝试使用numpy.random.multivariate_normal
生成多个样本,其中每个样本都来自具有不同mean
和cov
的多变量Normal分布。例如,如果我想绘制2个样本,我试过
from numpy import random as rand
means = np.array([[-1., 0.], [1., 0.]])
covs = np.array([np.identity(2) for k in xrange(2)])
rand.multivariate_normal(means, covs)
但这导致ValueError: mean must be 1 dimensional
。我必须为此循环吗?我认为对于像rand.binomial
这样的函数来说这是可能的。
答案 0 :(得分:5)
正如@hpaulj建议的那样,您可以从标准的多元正态分布中生成样本,然后使用einsum
和/或广播来转换样本。通过将标准样本点乘以协方差矩阵的平方根来完成缩放。在下文中,我使用scipy.linalg.sqrtm
计算矩阵平方根,使用numpy.einsum
进行矩阵乘法。
import numpy as np
from scipy.linalg import sqrtm
import matplotlib.pyplot as plt
# Sequence of means
means = np.array([[-15., 0.], [15., 0.], [0., 0.]])
# Sequence of covariance matrices. Must be the same length as means.
covs = np.array([[[ 3, -1],
[-1, 2]],
[[ 1, 2],
[ 2, 5]],
[[ 1, 0],
[ 0, 1]]])
# Number of samples to generate for each (mean, cov) pair.
nsamples = 4000
# Compute the matrix square root of each covariance matrix.
sqrtcovs = np.array([sqrtm(c) for c in covs])
# Generate samples from the standard multivariate normal distribution.
dim = len(means[0])
u = np.random.multivariate_normal(np.zeros(dim), np.eye(dim),
size=(len(means), nsamples,))
# u has shape (len(means), nsamples, dim)
# Transform u.
v = np.einsum('ijk,ikl->ijl', u, sqrtcovs)
m = np.expand_dims(means, 1)
t = v + m
# t also has shape (len(means), nsamples, dim).
# t[i] holds the nsamples sampled from the distribution with mean means[i]
# and covariance cov[i].
plt.subplot(2, 1, 1)
plt.plot(t[...,0].ravel(), t[...,1].ravel(), '.', alpha=0.02)
plt.axis('equal')
plt.xlim(-25, 25)
plt.ylim(-8, 8)
plt.grid()
# Make another plot, where we generate the samples by passing the given
# means and covs to np.random.multivariate_normal. This plot should look
# the same as the first plot.
plt.subplot(2, 1, 2)
p0 = np.random.multivariate_normal(means[0], covs[0], size=nsamples)
p1 = np.random.multivariate_normal(means[1], covs[1], size=nsamples)
p2 = np.random.multivariate_normal(means[2], covs[2], size=nsamples)
plt.plot(p0[:,0], p0[:,1], 'b.', alpha=0.02)
plt.plot(p1[:,0], p1[:,1], 'g.', alpha=0.02)
plt.plot(p2[:,0], p2[:,1], 'r.', alpha=0.02)
plt.axis('equal')
plt.xlim(-25, 25)
plt.ylim(-8, 8)
plt.grid()
此方法可能不会更快地循环遍历means
和covs
数组并为每对调用multivariate_normal
一次(均值,cov)。这种方法可以带来最大好处的情况是,当你有许多不同的均值和协方差并且每对产生少量样本时。即便如此,它可能也不会更快,因为脚本在covs
数组上使用Python循环来为每个协方差矩阵调用sqrtm
。如果性能至关重要,请使用实际数据进行测试。