Question

scipy文档包含creating correlated random samples的示例。完整的代码位于问题的末尾。

协方差矩阵：

# The desired covariance matrix.
r = np.array([
        [  3.40, -2.75, -2.00],
        [ -2.75,  5.50,  1.50],
        [ -2.00,  1.50,  1.25]
    ])

我的问题是协方差矩阵中的每个值如何影响输出？即如果要构建仅包含2个变量或具有3个以上变量的样本数据集，如何确定可以在协方差矩阵中使用的值？

"""Example of generating correlated normally distributed random samples."""

import numpy as np
from scipy.linalg import eigh, cholesky
from scipy.stats import norm

from pylab import plot, show, axis, subplot, xlabel, ylabel, grid


# Choice of cholesky or eigenvector method.
method = 'cholesky'
#method = 'eigenvectors'

num_samples = 400

# The desired covariance matrix.
r = np.array([
        [  3.40, -2.75, -2.00],
        [ -2.75,  5.50,  1.50],
        [ -2.00,  1.50,  1.25]
    ])

# Generate samples from three independent normally distributed random
# variables (with mean 0 and std. dev. 1).
x = norm.rvs(size=(3, num_samples))

# We need a matrix `c` for which `c*c^T = r`.  We can use, for example,
# the Cholesky decomposition, or the we can construct `c` from the
# eigenvectors and eigenvalues.

if method == 'cholesky':
    # Compute the Cholesky decomposition.
    c = cholesky(r, lower=True)
else:
    # Compute the eigenvalues and eigenvectors.
    evals, evecs = eigh(r)
    # Construct c, so c*c^T = r.
    c = np.dot(evecs, np.diag(np.sqrt(evals)))

# Convert the data to correlated random variables. 
y = np.dot(c, x)

#
# Plot various projections of the samples.
#
subplot(2,2,1)
plot(y[0], y[1], 'b.')
ylabel('y[1]')
axis('equal')
grid(True)

subplot(2,2,3)
plot(y[0], y[2], 'b.')
xlabel('y[0]')
ylabel('y[2]')
axis('equal')
grid(True)

subplot(2,2,4)
plot(y[1], y[2], 'b.')
xlabel('y[1]')
axis('equal')
grid(True)

show()

Answer 1

如何确定可以在协方差矩阵中使用的值？

您不会“确定”任何值。这完全是您的选择。如果要使用2个变量，则协方差矩阵的形状为（2,2）。如果希望第一个变量与第二个变量相关，则在[1,2]索引中输入一个正值。我认为您可能通常需要阅读协方差矩阵，并了解协方差矩阵中的值如何影响输出分布。本质上，这不是一个棘手的问题。您完全负责协方差矩阵中的值。这取决于您希望RV关联多少。

当生成相关的正态分布随机样本时，协方差矩阵如何影响输出？

1 个答案: