从熊猫数据帧计算平均值

时间:2018-01-05 14:10:38

标签: python pandas dataframe

我有一个数据框,其中datetime作为索引。数据仅来自12月,1月和2月。我试图计算dec,jan和feb的平均值。当我喜欢的时候:

df.resample('a').mean()

然后它给了我jan feb dec的平均值。

有没有在pandas数据框中做到这一点?

我的数据如下:

2000-02-29    0.046871
2000-03-31         NaN
2000-04-30         NaN
2000-05-31         NaN
2000-06-30         NaN
2000-07-31         NaN
2000-08-31         NaN
2000-09-30         NaN
2000-10-31         NaN
2000-11-30         NaN
2000-12-31    0.015948
2001-01-31    0.020552
2001-02-28    0.033409
2001-03-31         NaN
2001-04-30         NaN
2001-05-31         NaN
2001-06-30         NaN
2001-07-31         NaN
2001-08-31         NaN
2001-09-30         NaN
2001-10-31         NaN
2001-11-30         NaN
2001-12-31    0.013204
2002-01-31    0.017093
2002-02-28    0.019723
2002-03-31         NaN
2002-04-30         NaN

1 个答案:

答案 0 :(得分:4)

groupby需要strftime

df = df.groupby(df.index.strftime('%b')).mean()
print (df)
          col
Dec  0.014576
Feb  0.033334
Jan  0.018822

如果还需要几年:

df = df.groupby(df.index.strftime('%Y-%b')).mean()
print (df)
               col
2000-Dec  0.015948
2000-Feb  0.046871
2001-Dec  0.013204
2001-Feb  0.033409
2001-Jan  0.020552
2002-Feb  0.019723
2002-Jan  0.017093

另一种解决方案是转换为to_period

df = df.groupby(df.index.to_period('m')).mean()
print (df)
              col
2000-02  0.046871
2000-12  0.015948
2001-01  0.020552
2001-02  0.033409
2001-12  0.013204
2002-01  0.017093
2002-02  0.019723

编辑:

您需要转换一个月,因为December然后按year分组:

year = df.shift(freq='m').index.year
print (year)
Int64Index([2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2001,
            2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001,
            2002, 2002, 2002, 2002, 2002],
           dtype='int64')


df = df.groupby(year).mean()
print (df)
           col
2000  0.046871
2001  0.023303
2002  0.016673