使用熊猫如何根据日期查找列中某些元素的平均值?

时间:2020-05-28 07:53:22

标签: python pandas

我有一个数据帧df

index  Heads
as     4
as     3
as     2
as     5
as     3
cd     4
cd     5
cd     6

使用以下代码,我可以将输出显示为:

avg = df['Heads'].groupby(df.index).mean()
df.reset_index().pivot_table(columns=["index"]).T

index  Heads
as     3.4   
cd     5

但是我还有另一个数据框df2,其中带有额外的日期列,例如:

index  date         Heads
as     01-02-2000   4
as     04-03-2002   3
as     09-01-2003   2
as     23-12-2010   5
as     14-04-2006   3
cd     04-01-2004   4
cd     04-05-2007   5
cd     04-05-2001   6

在这里,我想像上述情况一样取Heads的平均值,但仅考虑介于2000年到2005年之间的元素。因此,预期输出为:

index  Heads
as     3   
cd     5  

1 个答案:

答案 0 :(得分:1)

Series.between中将Series.dt.yearboolean indexing一起使用,然后可以将meanlevel参数一起使用:

df['date'] = pd.to_datetime(df['date'], dayfirst=True)


df = (df[df['date'].dt.year.between(2000, 2005)]
         .mean(level=0)
         .reset_index())
print (df)
  index  Heads
0    as      3
1    cd      5

或者:

df = (df[df['date'].dt.year.between(2000, 2005)]
         .groupby(level=0).mean()
         .reset_index())
print (df)
  index  Heads
0    as      3
1    cd      5