Question

如何根据每个边际进行基于直方图的概率密度估计该数据集的分布p（x1）和p（x2）：

import numpy as np
import matplotlib.pyplot as plt
linalg = np.linalg

N = 100
mean = [1,1]
cov = [[0.3, 0.2],[0.2, 0.2]]
data = np.random.multivariate_normal(mean, cov, N)
L = linalg.cholesky(cov)
# print(L.shape)
# (2, 2)
uncorrelated = np.random.standard_normal((2,N))
data2 = np.dot(L,uncorrelated) + np.array(mean).reshape(2,1)
# print(data2.shape)
# (2, 1000)
plt.scatter(data2[0,:], data2[1,:], c='green')    
plt.scatter(data[:,0], data[:,1], c='yellow')
plt.show()

为此，您可以在Matlab或R中使用hist函数。如何更改bin 宽度（或等效地，箱的数量）影响图和估计 p（x1）和p（x2）？

我正在使用Python，是否有类似于matlab的hist函数以及如何实现它？

Answer 1

Matlab hist函数在matplotlib中实现为（你猜对了）matplotlib.pyplot.hist。它绘制直方图，将箱数作为参数。要在不绘制直方图的情况下计算直方图，请使用Numpy的numpy.histogram函数。

要估算概率分布，您可以使用scipy.stats中的分布。您从正态分布生成了上述数据。要使此数据符合正态分布，您可以使用scipy.stats.norm.fit。下面是一个代码示例，它绘制数据的直方图并符合其正态分布：

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
linalg = np.linalg

N = 100
mean = [1,1]
cov = [[0.3, 0.2],[0.2, 0.2]]
data = np.random.multivariate_normal(mean, cov, N)
L = linalg.cholesky(cov)
# print(L.shape)
# (2, 2)
uncorrelated = np.random.standard_normal((2,N))
data2 = np.dot(L,uncorrelated) + np.array(mean).reshape(2,1)
# print(data2.shape)
# (2, 1000)
plt.figure()
plt.scatter(data2[0,:], data2[1,:], c='green')    
plt.scatter(data[:,0], data[:,1], c='yellow')
plt.show()

# Plotting histograms and fitting normal distributions
plt.subplot(211)
plt.hist(data[:,0], bins=20, normed=1, alpha=0.5, color='green')
plt.hist(data2[0,:], bins=20, normed=1, alpha=0.5, color='yellow')
x = np.arange(-1, 3, 0.001)
plt.plot(x, norm.pdf(x, *norm.fit(data[:,0])), color='green')
plt.plot(x, norm.pdf(x, *norm.fit(data2[0,:])), color='yellow')
plt.title('Var 1')

plt.subplot(212)
plt.hist(data[:,1], bins=20, normed=1, alpha=0.5, color='green')
plt.hist(data2[1,:], bins=20, normed=1, alpha=0.5, color='yellow')
x = np.arange(-1, 3, 0.001)
plt.plot(x, norm.pdf(x, *norm.fit(data[:,1])), color='green')
plt.plot(x, norm.pdf(x, *norm.fit(data2[1,:])), color='yellow')
plt.title('Var 2')

plt.tight_layout()

基于直方图的概率密度估计

1 个答案: