熊猫计算平均值和stdev

时间:2018-03-27 21:17:41

标签: python pandas mean standard-deviation

我的目标是计算不同样本的均值和标准差。例如,我有一个df:

Start   End   N     CombinedMean   CombinedDev
abc     x     99    44.7           5.2
abc     y     30    39.3           19
ijk     x     50    20             5
ijk     z     7     10             2

输出:

CombinedMean = sum(n*mean)/Sum(n)

CombinedDev = sqrt[ sum(n*(dev^2+mean^2))/sum(n) - (combined mean)^2]

通常我会使用groupby和agg({'N':'sum','Mean':'mean','Dev':'mean'})但这在数学上是不正确的。要计算这种情况下的组合均值和偏差,我必须使用:

N = n1+n2 = 54+45 = 99

Sum = n1*mean1 + n2*mean2 = 54*47+45*42 = 4,428

The combined mean is Sum / N = 4,428 / 99 = 44.7

Sum of Squares = n1*(sd1^2 + mean1^2) + n2*(sd2^2 + mean2^2) = 54*(5^2 + 47^2) + 45*(4^2 + 42^2) = 200,736

So the combined standard deviation is SQRT(Sum of Squares / N - Mean^2) = SQRT(200,739/99 - 44.7^2) = 5.2

例如,

{{1}}

长问题:如何在我的代码中实现2个公式?非常感谢!我为这个冗长的问题道歉:)

0 个答案:

没有答案