Question

我试图使用Python模拟“样本比例的样本分布”。我在示例here

中尝试了伯努利变量

问题在于，在大量的口香糖中，我们有黄色球的真实比例为0.6。如果我们取样（一定大小，例如10个），取其平均值并作图，我们应该得到正态分布。

我设法获得了正常的采样分布，但是，具有相同的mu和sigma的实际法线连续曲线根本不拟合，但放大了几个因素。我不确定是什么原因造成的，理想情况下它是否应该完美地适合。下面是我的代码和输出。我尝试改变幅度和sigma（除以sqrt（samplesize）），但没有任何帮助。请帮助。

代码：

from SDSP import create_bernoulli_population, get_frequency_df
from random import shuffle, choices
from bi_to_nor_demo import get_metrics, bare_minimal_plot
import matplotlib.pyplot as plt


N = 10000  # 10000 balls
p = 0.6    # probability of yellow ball is 0.6, and others (1-0.6)=>0.4
n_pickups = 10       # sample size
n_experiments = 2000  # I dont know what this is called 


# STATISTICAL PDF
# choose sample, take mean and add to X_mean_list. Do this for n_experiments times. 
X_hat = []
X_mean_list = []
for each_experiment in range(n_experiments):
    X_hat = choices(population, k=n_pickups)  # choose, say 10 samples from population (with replacement)
    X_mean = sum(X_hat)/len(X_hat)
    X_mean_list.append(X_mean)
stats_df = get_frequency_df(X_mean_list)


# plot both theoretical and statistical outcomes
fig, ax = plt.subplots(1,1, figsize=(5,5))
from SDSP import plot_pdf
mu,var,sigma = get_metrics(stats_df)
plot_pdf(stats_df, ax, n_pickups, mu, sigma, p=mu, bar_width=round(0.5/n_pickups,3),
         title='Sampling Distribution of\n a Sample Proportion')
plt.tight_layout()
plt.show()

输出：
红色曲线是不当法线近似曲线。 mu和sigma是从统计离散分布（蓝色小条）得出的，并馈入公式计算正态曲线。但是法线看起来以某种方式放大了。
output image

更新：
避免除法取平均值，解决图形问题但亩定比例。因此问题仍未完全解决。：（

X_mean = sum(X_hat) # removed the division /len(X_hat)

除去上述除法后的输出（但是否需要？）
output

抽样分布正态近似拟合

0 个答案: