我有一个数据帧df
index Heads
as 4
as 3
as 2
as 5
as 3
cd 4
cd 5
cd 6
使用以下代码,我可以将输出显示为:
avg = df['Heads'].groupby(df.index).mean()
df.reset_index().pivot_table(columns=["index"]).T
index Heads
as 3.4
cd 5
但是我还有另一个数据框df2
,其中带有额外的日期列,例如:
index date Heads
as 01-02-2000 4
as 04-03-2002 3
as 09-01-2003 2
as 23-12-2010 5
as 14-04-2006 3
cd 04-01-2004 4
cd 04-05-2007 5
cd 04-05-2001 6
在这里,我想像上述情况一样取Heads
的平均值,但仅考虑介于2000年到2005年之间的元素。因此,预期输出为:
index Heads
as 3
cd 5
答案 0 :(得分:1)
在Series.between
中将Series.dt.year
与boolean indexing
一起使用,然后可以将mean
与level
参数一起使用:
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df = (df[df['date'].dt.year.between(2000, 2005)]
.mean(level=0)
.reset_index())
print (df)
index Heads
0 as 3
1 cd 5
或者:
df = (df[df['date'].dt.year.between(2000, 2005)]
.groupby(level=0).mean()
.reset_index())
print (df)
index Heads
0 as 3
1 cd 5