大熊猫标准误差计算问题

时间:2017-09-09 10:38:23

标签: python pandas

我在使用pandas为某些数据获取正确的标准错误值时遇到了一些问题。以下是如何重现问题。

9/9/2017 12:38:36 PM    Access Problem
9/9/2017 12:38:36 PM    Access Problem
9/9/2017 12:38:36 PM    Access Problem
9/9/2017 12:27:34 PM    Access Problem
9/9/2017 12:27:34 PM    Access Problem
9/9/2017 9:10:13 AM     Egress (09-September-2017)
9/9/2017 9:10:13 AM     Egress (09-September-2017)
9/9/2017 9:10:13 AM     Egress (09-September-2017)
9/9/2017 9:07:33 AM     Report on 08-Sep-2017.
9/9/2017 9:07:33 AM     Report on 08-Sep-2017.
9/9/2017 7:23:41 AM     Password reset
9/9/2017 7:23:41 AM     Password reset
9/9/2017 7:23:41 AM     Password reset
9/9/2017 7:04:55 AM     Report on 08-Sep-2017.
9/9/2017 7:04:55 AM     Report on 08-Sep-2017.
9/9/2017 7:04:55 AM     Report on 08-Sep-2017.
9/9/2017 6:39:51 AM     Handover of 08th September, 2017
9/9/2017 6:39:51 AM     Handover of 08th September, 2017
9/9/2017 2:45:18 AM     Usages report on 07th September , 2017
9/9/2017 2:45:18 AM     Usages report on 07th September , 2017
9/9/2017 2:45:18 AM     Usages report on 07th September , 2017

这很好,一切正常,但是我在excel中测试了这个,发现了一个差异。平均值和标准开发都很好,但是sem计算未校正的(std / sqrt(n))值,而不是样本的校正值(std / sqrt(n-1))。以下是excel中的输出:

enter image description here

我认为这个问题可能与每个条件不相等有关吗?正如我们在数据集中看到的,条件1的n是4,而条件2 n = 2。 [抱歉,字典赋值与pandas df列的顺序相混淆......]

有人可以帮忙解释一下这里发生了什么吗?

1 个答案:

答案 0 :(得分:2)

您需要使用

修改您的功能
In [846]: (dataset.groupby('condition_number')
                  .agg(lambda x: x.std()/x.count().add(-1).pow(0.5)))
Out[846]:
                  post-score  pre-score  subject_number
condition_number
1                  10.405407  11.743628        2.357023
2                   8.775549  35.703943        4.242641