我在使用pandas为某些数据获取正确的标准错误值时遇到了一些问题。以下是如何重现问题。
9/9/2017 12:38:36 PM Access Problem
9/9/2017 12:38:36 PM Access Problem
9/9/2017 12:38:36 PM Access Problem
9/9/2017 12:27:34 PM Access Problem
9/9/2017 12:27:34 PM Access Problem
9/9/2017 9:10:13 AM Egress (09-September-2017)
9/9/2017 9:10:13 AM Egress (09-September-2017)
9/9/2017 9:10:13 AM Egress (09-September-2017)
9/9/2017 9:07:33 AM Report on 08-Sep-2017.
9/9/2017 9:07:33 AM Report on 08-Sep-2017.
9/9/2017 7:23:41 AM Password reset
9/9/2017 7:23:41 AM Password reset
9/9/2017 7:23:41 AM Password reset
9/9/2017 7:04:55 AM Report on 08-Sep-2017.
9/9/2017 7:04:55 AM Report on 08-Sep-2017.
9/9/2017 7:04:55 AM Report on 08-Sep-2017.
9/9/2017 6:39:51 AM Handover of 08th September, 2017
9/9/2017 6:39:51 AM Handover of 08th September, 2017
9/9/2017 2:45:18 AM Usages report on 07th September , 2017
9/9/2017 2:45:18 AM Usages report on 07th September , 2017
9/9/2017 2:45:18 AM Usages report on 07th September , 2017
这很好,一切正常,但是我在excel中测试了这个,发现了一个差异。平均值和标准开发都很好,但是sem计算未校正的(std / sqrt(n))值,而不是样本的校正值(std / sqrt(n-1))。以下是excel中的输出:
我认为这个问题可能与每个条件不相等有关吗?正如我们在数据集中看到的,条件1的n是4,而条件2 n = 2。 [抱歉,字典赋值与pandas df列的顺序相混淆......]
有人可以帮忙解释一下这里发生了什么吗?
答案 0 :(得分:2)
您需要使用
修改您的功能In [846]: (dataset.groupby('condition_number')
.agg(lambda x: x.std()/x.count().add(-1).pow(0.5)))
Out[846]:
post-score pre-score subject_number
condition_number
1 10.405407 11.743628 2.357023
2 8.775549 35.703943 4.242641