我正在尝试使用bootstrapping对子进行1000次重复(np.random.choice)进行重新采样替换,我可以计算每次复制的均值。然后将这些平均值的标准偏差与标准值进行比较。
但是我没有得到正确的引导部分,如何修复那部分?
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
from scipy import stats
df = pd.read_csv('http://www.math.uah.edu/stat/data/Pearson.txt',
delim_whitespace=True)
df.head()
y = df['Son'].values
Replications = np.random.choice(y, 1000, replace = True)
print("Replications: " , Replications)
print("")
Mean = np.mean(Replications)
print("Mean: " , Mean)
sem = stats.sem(y)
print ("The SEM : ", sem)
答案 0 :(得分:2)
您可以按如下方式创建1000个长度为len(df)
的复制:
Replications = np.array([np.random.choice(df.Son, len(df), replace = True) for _ in range(1000)])
Mean = np.mean(Replications, axis=1)
print("Mean: " , Mean)
谢谢!