我试图使用Python模拟“样本比例的样本分布”。我在示例here
中尝试了伯努利变量问题在于,在大量的口香糖中,我们有黄色球的真实比例为0.6。如果我们取样(一定大小,例如10个),取其平均值并作图,我们应该得到正态分布。
我设法获得了正常的采样分布,但是,具有相同的mu和sigma的实际法线连续曲线根本不拟合,但放大了几个因素。我不确定是什么原因造成的,理想情况下它是否应该完美地适合。下面是我的代码和输出。我尝试改变幅度和sigma(除以sqrt(samplesize)),但没有任何帮助。请帮助。
代码:
from SDSP import create_bernoulli_population, get_frequency_df
from random import shuffle, choices
from bi_to_nor_demo import get_metrics, bare_minimal_plot
import matplotlib.pyplot as plt
N = 10000 # 10000 balls
p = 0.6 # probability of yellow ball is 0.6, and others (1-0.6)=>0.4
n_pickups = 10 # sample size
n_experiments = 2000 # I dont know what this is called
# STATISTICAL PDF
# choose sample, take mean and add to X_mean_list. Do this for n_experiments times.
X_hat = []
X_mean_list = []
for each_experiment in range(n_experiments):
X_hat = choices(population, k=n_pickups) # choose, say 10 samples from population (with replacement)
X_mean = sum(X_hat)/len(X_hat)
X_mean_list.append(X_mean)
stats_df = get_frequency_df(X_mean_list)
# plot both theoretical and statistical outcomes
fig, ax = plt.subplots(1,1, figsize=(5,5))
from SDSP import plot_pdf
mu,var,sigma = get_metrics(stats_df)
plot_pdf(stats_df, ax, n_pickups, mu, sigma, p=mu, bar_width=round(0.5/n_pickups,3),
title='Sampling Distribution of\n a Sample Proportion')
plt.tight_layout()
plt.show()
输出:
红色曲线是不当法线近似曲线。 mu和sigma是从统计离散分布(蓝色小条)得出的,并馈入公式计算正态曲线。但是法线看起来以某种方式放大了。
更新:
避免除法取平均值,解决图形问题但亩定比例。因此问题仍未完全解决。 :(
X_mean = sum(X_hat) # removed the division /len(X_hat)
除去上述除法后的输出(但是否需要?)