我试图将数据框按月分组,并在该月内按一天中的小时分组,以获取每月每个小时的平均值。到目前为止,我已经运行了以下行,但是它不起作用:df=df.groupby([pd.Grouper(freq='M'),pd.Grouper(freq='h')]).mean()
。关于如何提高效率的任何想法吗?
date = ['2015-02-03 23:00:00','2015-02-03 23:30:00','2015-02-04 00:00:00','2015-02-04 00:30:00','2015-02-04 01:00:00','2015-02-04 01:30:00','2015-02-04 02:00:00','2015-02-04 02:30:00','2015-02-04 03:00:00','2015-02-04 03:30:00','2015-02-04 04:00:00','2015-02-04 04:30:00','2015-02-04 05:00:00','2015-02-04 05:30:00','2015-02-04 06:00:00','2015-02-04 06:30:00','2015-02-04 07:00:00','2015-02-04 07:30:00','2015-02-04 08:00:00','2015-02-04 08:30:00','2015-02-04 09:00:00','2015-02-04 09:30:00','2015-02-04 10:00:00','2015-02-04 10:30:00','2015-02-04 11:00:00','2015-02-04 11:30:00','2015-02-04 12:00:00','2015-02-04 12:30:00','2015-02-04 13:00:00','2015-02-04 13:30:00','2015-02-04 14:00:00','2015-02-04 14:30:00','2015-02-04 15:00:00','2015-02-04 15:30:00','2015-02-04 16:00:00','2015-02-04 16:30:00','2015-02-04 17:00:00','2015-02-04 17:30:00','2015-02-04 18:00:00','2015-02-04 18:30:00','2015-02-04 19:00:00','2015-02-04 19:30:00','2015-02-04 20:00:00','2015-02-04 20:30:00','2015-02-04 21:00:00','2015-02-04 21:30:00','2015-02-04 22:00:00','2015-02-04 22:30:00','2015-02-04 23:00:00','2015-02-04 23:30:00']
value = [33.24 , 31.71 , 34.39 , 34.49 , 34.67 , 34.46 , 34.59 , 34.83 , 35.78 , 33.03 , 35.49 , 33.79 , 36.12 , 37.09 , 39.54 , 41.19 , 45.99 , 50.23 , 46.72 , 47.47 , 48.46 , 48.38 , 48.40 , 48.13 , 38.35 , 38.19 , 38.12 , 38.05 , 38.06 , 37.83 , 37.49 , 37.41 , 41.84 , 42.26 , 44.09 , 48.85 , 50.07 , 50.94 , 51.09 , 50.60 , 47.39 , 45.57 , 45.03 , 44.98 , 41.32 , 40.37 , 41.12 , 39.33 , 35.38 , 33.44 ]
df = pd.DataFrame({'value':value,'index':date})
df.index = pd.to_datetime(df['index'],format='%Y-%m-%d %H:%M')
df.drop(['index'],axis=1,inplace=True)
print(df)
value
index
2015-02-03 23:00:00 33.24
2015-02-03 23:30:00 31.71
2015-02-04 00:00:00 34.39
2015-02-04 00:30:00 34.49
2015-02-04 01:00:00 34.67
2015-02-04 01:30:00 34.46
答案 0 :(得分:1)
将Dataframe.reset_index
+ DataFrame.groupby
与Series.dt
一起使用:
df2=df.reset_index()
df3=df2.groupby([df2['index'].dt.year.rename('year'),df2['index'].dt.month.rename('month'),df2['index'].dt.hour.rename('hour')]).mean()
print(df3)
value
year month hour
2015 2 0 34.4400
1 34.5650
2 34.7100
3 34.4050
4 34.6400
5 36.6050
6 40.3650
7 48.1100
8 47.0950
9 48.4200
10 48.2650
11 38.2700
12 38.0850
13 37.9450
14 37.4500
15 42.0500
16 46.4700
17 50.5050
18 50.8450
19 46.4800
20 45.0050
21 40.8450
22 40.2250
23 33.4425
如果您不想考虑年份,只需在分组时不包括年份:
df3=df2.groupby([df2['index'].dt.month.rename('month'),df2['index'].dt.hour.rename('hour')]).mean()
value
month hour
2 0 34.4400
1 34.5650
2 34.7100
3 34.4050
4 34.6400
5 36.6050
6 40.3650
7 48.1100
8 47.0950
9 48.4200
10 48.2650
11 38.2700
12 38.0850
13 37.9450
14 37.4500
15 42.0500
16 46.4700
17 50.5050
18 50.8450
19 46.4800
20 45.0050
21 40.8450
22 40.2250
23 33.4425
答案 1 :(得分:1)
想法全天转换为1
,并添加hours
作为助手DatetimeIndex
并传递给groupby
:
idx = df.index.to_period('M').to_timestamp() + pd.to_timedelta(df.index.hour, unit='H')
或者:
idx = df.index.map(lambda x: x.replace(day=1, minute=0))
df = df.groupby(idx).mean()
print (df)
value
index
2015-02-01 00:00:00 34.4400
2015-02-01 01:00:00 34.5650
2015-02-01 02:00:00 34.7100
2015-02-01 03:00:00 34.4050
2015-02-01 04:00:00 34.6400
2015-02-01 05:00:00 36.6050
2015-02-01 06:00:00 40.3650
2015-02-01 07:00:00 48.1100
2015-02-01 08:00:00 47.0950
2015-02-01 09:00:00 48.4200
2015-02-01 10:00:00 48.2650
2015-02-01 11:00:00 38.2700
2015-02-01 12:00:00 38.0850
2015-02-01 13:00:00 37.9450
2015-02-01 14:00:00 37.4500
2015-02-01 15:00:00 42.0500
2015-02-01 16:00:00 46.4700
2015-02-01 17:00:00 50.5050
2015-02-01 18:00:00 50.8450
2015-02-01 19:00:00 46.4800
2015-02-01 20:00:00 45.0050
2015-02-01 21:00:00 40.8450
2015-02-01 22:00:00 40.2250
2015-02-01 23:00:00 33.4425