熊猫Groupby季节和年份平均列

时间:2020-07-30 17:37:55

标签: pandas pandas-groupby average

我有一个如下所示的df“ ncData”,我正在尝试按季节(冬季,春季,夏季,秋季)对数据进行分组,并采用2008年月份的风速和功率列的平均值每个windfarm_name每年的每个季节。这是ncData的前几行:

ncData.head(2)
Out[432]: 
     site_name windfarm_name region_name                      time  \
4055     REDCK    Red Creek   Northeast 2019-12-28 20:00:00+00:00   
4056     REDCK    Red Creek   Northeast 2019-12-28 19:00:00+00:00   

      wind_speed    power       Dates     Hours  year month day  Season  
4055     5.89692  23.9702  2019-12-28  20:00:00  2019    12  28  Winter  
4056     4.75525  13.8225  2019-03-28  19:00:00  2019     3  28  Spring 

我尝试过类似的事情:

ncData.groupby([pd.Grouper(key='Season', freq='1Y'),pd.Grouper(key='windfarm_name')]).mean()

出现此错误:

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 
'Index'

而且,我已经尝试过这样:

ncData.groupby(['Season','windfarm_name'],freq='1Y')['wind_speed'].mean()

我需要输出看起来像这样:

         time       windfarm_name  season         wind_speed power
0    1991          Red Creek      winter         3.917762   8.276560
1    1991          Red Creek      spring         3.046854   0.132271
2    1991          Red Creek      summer         3.737426   6.799836
3    1991          Red Creek      autumn         3.870350   4.010200
4    1991         Oasis Wind      winter         2.955412   2.898962
5    1991         Oasis Wind      spring         2.707168   0.076643

谢谢!

1 个答案:

答案 0 :(得分:1)

您几乎拥有了

ncData.groupby(['year', 'windfarm_name', 'Season'])['wind_speed', 'power'].mean()

请注意,您可以不将时间列分为年,月,日。只要确保其类型为DateTime

ncData.groupby([ncData['time'].month, 'windfarm_name', 'Season'])['wind_speed', 'power'].mean()