我有一个数据框,其中datetime作为索引。数据仅来自12月,1月和2月。我试图计算dec,jan和feb的平均值。当我喜欢的时候:
df.resample('a').mean()
然后它给了我jan feb dec的平均值。
有没有在pandas数据框中做到这一点?
我的数据如下:
2000-02-29 0.046871
2000-03-31 NaN
2000-04-30 NaN
2000-05-31 NaN
2000-06-30 NaN
2000-07-31 NaN
2000-08-31 NaN
2000-09-30 NaN
2000-10-31 NaN
2000-11-30 NaN
2000-12-31 0.015948
2001-01-31 0.020552
2001-02-28 0.033409
2001-03-31 NaN
2001-04-30 NaN
2001-05-31 NaN
2001-06-30 NaN
2001-07-31 NaN
2001-08-31 NaN
2001-09-30 NaN
2001-10-31 NaN
2001-11-30 NaN
2001-12-31 0.013204
2002-01-31 0.017093
2002-02-28 0.019723
2002-03-31 NaN
2002-04-30 NaN
答案 0 :(得分:4)
df = df.groupby(df.index.strftime('%b')).mean()
print (df)
col
Dec 0.014576
Feb 0.033334
Jan 0.018822
如果还需要几年:
df = df.groupby(df.index.strftime('%Y-%b')).mean()
print (df)
col
2000-Dec 0.015948
2000-Feb 0.046871
2001-Dec 0.013204
2001-Feb 0.033409
2001-Jan 0.020552
2002-Feb 0.019723
2002-Jan 0.017093
另一种解决方案是转换为to_period
:
df = df.groupby(df.index.to_period('m')).mean()
print (df)
col
2000-02 0.046871
2000-12 0.015948
2001-01 0.020552
2001-02 0.033409
2001-12 0.013204
2002-01 0.017093
2002-02 0.019723
编辑:
您需要转换一个月,因为December
然后按year
分组:
year = df.shift(freq='m').index.year
print (year)
Int64Index([2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2001,
2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001,
2002, 2002, 2002, 2002, 2002],
dtype='int64')
df = df.groupby(year).mean()
print (df)
col
2000 0.046871
2001 0.023303
2002 0.016673