我有这个数据框:
startTime endTime emails_received
index
2014-01-24 14:00:00 1390568400 1390569600 684
2014-01-24 14:00:00 1390568400 1390569300 700
2014-01-24 14:05:00 1390568700 1390569300 438
2014-01-24 14:05:00 1390568700 1390569900 586
2014-01-24 16:00:00 1390575600 1390576500 752
2014-01-24 16:00:00 1390575600 1390576500 743
2014-01-24 16:00:00 1390575600 1390576500 672
2014-01-24 16:00:00 1390575600 1390576200 712
2014-01-24 16:00:00 1390575600 1390576800 708
我运行resample(“10min”,how =“median”)。dropna()然后我得到:
startTime endTime emails_received
start
2014-01-24 14:00:00 1390568550 1390569450 635
2014-01-24 16:00:00 1390575600 1390576500 712
这是正确的。有没有什么方法可以通过熊猫轻松获得平均值的标准偏差?
答案 0 :(得分:7)
您只需要在DataFrame上调用.std()
即可。这是一个说明性的例子。
创建DatetimeIndex
In [38]: index = pd.DatetimeIndex(start='2000-1-1',freq='1T', periods=1000)
创建一个包含2列的DataFrame
In [45]: df = pd.DataFrame({'a':range(1000), 'b':range(1000,3000,2)}, index=index)
DataFrame的头部,标准和平均值
In [47]: df.head() Out[47]: a b 2000-01-01 00:00:00 0 1000 2000-01-01 00:01:00 1 1002 2000-01-01 00:02:00 2 1004 2000-01-01 00:03:00 3 1006 2000-01-01 00:04:00 4 1008 In [48]: df.std() Out[48]: a 288.819436 b 577.638872 dtype: float64 In [49]: df.mean() Out[49]: a 499.5 b 1999.0 dtype: float64
下采样并执行相同的统计分数计算
In [54]: df = df.resample(rule="10T",how="median") In [55]: df Out[55]: DatetimeIndex: 100 entries, 2000-01-01 00:00:00 to 2000-01-01 16:30:00 Freq: 10T Data columns (total 2 columns): a 100 non-null values b 100 non-null values dtypes: float64(1), int64(1) In [56]: df.head() Out[56]: a b 2000-01-01 00:00:00 4.5 1009 2000-01-01 00:10:00 14.5 1029 2000-01-01 00:20:00 24.5 1049 2000-01-01 00:30:00 34.5 1069 2000-01-01 00:40:00 44.5 1089 In [57]: df.std() Out[57]: a 290.11492 b 580.22984 dtype: float64 In [58]: df.mean() Out[58]: a 499.5 b 1999.0 dtype: float64
std()
In [62]: df2 = df.resample(rule="10T", how=np.std) In [63]: df2 Out[63]: DatetimeIndex: 100 entries, 2000-01-01 00:00:00 to 2000-01-01 16:30:00 Freq: 10T Data columns (total 2 columns): a 100 non-null values b 100 non-null values dtypes: float64(2) In [64]: df2.head() Out[64]: a b 2000-01-01 00:00:00 3.02765 6.055301 2000-01-01 00:10:00 3.02765 6.055301 2000-01-01 00:20:00 3.02765 6.055301 2000-01-01 00:30:00 3.02765 6.055301 2000-01-01 00:40:00 3.02765 6.055301
以下是.std()
方法的文档字符串中的信息。
Return standard deviation over requested axis. NA/null values are excluded Parameters ---------- axis : {0, 1} 0 for row-wise, 1 for column-wise skipna : boolean, default True Exclude NA/null values. If an entire row/column is NA, the result will be NA level : int, default None If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame Returns ------- std : Series (or DataFrame if level specified) Normalized by N-1 (unbiased estimator).