首先导入pandas并创建具有完美正态分布的Series:
import pandas as pd
lst = [[5 for x in range(5)], [4 for x in range(4)], [3 for x in range(3)],
[2 for x in range(2)], [1 for x in range(1)], [2 for x in range(2)],
[3 for x in range(3)], [4 for x in range(4)], [5 for x in range(5)]]
lst = [item for sublists in lst for item in sublists]
series = pd.Series(lst)
让我们检查一下,这种分布是正常的:
print(round(sum(series - series.mean()) / series.count(), 1) == 0)
# if distribution is normal we'll see True
现在让我们为宇宙打印sem():
print(series.sem(ddof=0))
# 0.21619987017
现在来样品:
print(series.sem()) # ddof=1
# 0.220026713637
但是我无法理解如果它与宇宙一起工作,熊猫如何计算平均值的标准误差。是否使用
se_x = sd_x / sqrt(len(x))
还是创建样品?如果它创建样本,我可以设置多少以及如何设置它们的数量?
如果计数< pandas如何计算样本的sem 30?
答案 0 :(得分:1)
Pandas generates sem
method dynamically
cls.sem = _make_stat_function_ddof(
cls, 'sem', name, name2, axis_descr,
"Return unbiased standard error of the mean over requested "
"axis.\n\nNormalized by N-1 by default. This can be changed "
"using the ddof argument",
nanops.nansem)
@disallow('M8', 'm8')
def nansem(values, axis=None, skipna=True, ddof=1):
var = nanvar(values, axis, skipna, ddof=ddof)
mask = isnull(values)
if not is_float_dtype(values.dtype):
values = values.astype('f8')
count, _ = _get_counts_nanvar(mask, axis, ddof, values.dtype)
var = nanvar(values, axis, skipna, ddof=ddof)
return np.sqrt(var) / np.sqrt(count)
您可能还想查看scipy.stats模块
中可用的方法