我有一个如下所示的df“ ncData”,我正在尝试按季节(冬季,春季,夏季,秋季)对数据进行分组,并采用2008年月份的风速和功率列的平均值每个windfarm_name每年的每个季节。这是ncData的前几行:
ncData.head(2)
Out[432]:
site_name windfarm_name region_name time \
4055 REDCK Red Creek Northeast 2019-12-28 20:00:00+00:00
4056 REDCK Red Creek Northeast 2019-12-28 19:00:00+00:00
wind_speed power Dates Hours year month day Season
4055 5.89692 23.9702 2019-12-28 20:00:00 2019 12 28 Winter
4056 4.75525 13.8225 2019-03-28 19:00:00 2019 3 28 Spring
我尝试过类似的事情:
ncData.groupby([pd.Grouper(key='Season', freq='1Y'),pd.Grouper(key='windfarm_name')]).mean()
出现此错误:
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of
'Index'
而且,我已经尝试过这样:
ncData.groupby(['Season','windfarm_name'],freq='1Y')['wind_speed'].mean()
我需要输出看起来像这样:
time windfarm_name season wind_speed power
0 1991 Red Creek winter 3.917762 8.276560
1 1991 Red Creek spring 3.046854 0.132271
2 1991 Red Creek summer 3.737426 6.799836
3 1991 Red Creek autumn 3.870350 4.010200
4 1991 Oasis Wind winter 2.955412 2.898962
5 1991 Oasis Wind spring 2.707168 0.076643
谢谢!
答案 0 :(得分:1)
您几乎拥有了
ncData.groupby(['year', 'windfarm_name', 'Season'])['wind_speed', 'power'].mean()
请注意,您可以不将时间列分为年,月,日。只要确保其类型为DateTime
和
ncData.groupby([ncData['time'].month, 'windfarm_name', 'Season'])['wind_speed', 'power'].mean()