经验概率分布与真实分布不一致

时间:2019-07-24 07:12:34

标签: python matplotlib

我正在尝试模拟均值170和标准偏差为2的正态总体的样本均值的采样分布。根据数学,大小为20的样本的均值分布为170和标准偏差为正态的正态分布为2 /(20 ^ 0.5)。

我正在绘制n = 20的经验抽样均值分布,并进行了50000次实验。然后,我使用a.hist(sample_means1, bins = 100),将高度除以50000,然后再次使用ax.plot进行绘制,以得出经验抽样均值分布。但是结果似乎不一致。这是代码:

import math
import statistics
import random

import matplotlib.pyplot as plt


def normal_pdf(m, s, x):
  coeff = 1/(s*math.sqrt(2*math.pi))
  expn = math.exp( -0.5*((x - m)/s)**2 );
  return coeff*expn

n_exp = 50000

fig, ax = plt.subplots()

sample_means1 = []
for i in range(n_exp):
  sample = [random.gauss(170, 2) for i in range(20)] 
  smean = statistics.mean(sample)
  sample_means1.append(smean)

f,a = plt.subplots()
h = a.hist(sample_means1, bins = 100)
probs = [i/n_exp for i in h[0]]
xl = min(h[1])
xr = max(h[1])
x = [xl + (xr-xl)*i/1000 for i in range(1001)]

ax.plot(h[1][0: 100], probs, '-', color = "black")
ax.plot(x, [normal_pdf(170, 2/math.sqrt(20), i) for i in x], '-', color = "blue")
fig.savefig("tes.png")

绘制结果:

enter image description here

1 个答案:

答案 0 :(得分:2)

要获得概率分布,应将高度除以实验次数乘以垃圾箱的宽度,即

widths = (h[1][1:]-h[1][:-1])
probs = h[0]/(widths*n_exp)
mid_points = (h[1][1:]+h[1][:-1])/2

ax.plot(mid_points, probs, '-', color = "black")
ax.plot(x, [normal_pdf(170, 2/math.sqrt(20), i) for i in x], '-', color = "blue")

fig.savefig("tes.png")

comparison of theoretical and empirical distribution