我有一个带有每小时频率的日期时间索引数据帧。我想制作一个groupby对象 - 按季节分组。按季节我的意思是春天是3个月,4个月,5个月,夏天是6,7,8,依此类推。我希望每个季节组合都有一个独特的小组。有没有办法使用自定义DateOffset执行此操作?它需要一个子类来做吗?或者我最好只制作一个季节专栏然后做:grouper = df.groupby([df['season'], df.index.year])
。
目前的代码很难看:
def group_season(df):
"""
This uses the meteorological seasons
"""
df['month'] = df.index.month
spring = df['month'].isin([3,4,5])
spring[spring] = 'spring'
summer = df['month'].isin([6,7,8])
summer[summer] = 'summer'
fall = df['month'].isin([9,10,11])
fall[fall] = 'fall'
winter = df['month'].isin([12,1,2])
winter[winter] = 'winter'
df['season'] = pd.concat([winter[winter != False], spring[spring != False],\
fall[fall != False], summer[summer != False]], axis=0)
return df.groupby([df['season'], df.index.year])
答案 0 :(得分:2)
对于您想要进行的分组,请使用anchored quarterly offsets。
import numpy as np
import pandas as pd
dates = pd.date_range('2016-01', freq='MS', periods=12)
df = pd.DataFrame({'num': np.arange(12)}, index=dates)
print(df)
# num
# 2016-01-01 0
# 2016-02-01 1
# 2016-03-01 2
# 2016-04-01 3
# 2016-05-01 4
# 2016-06-01 5
# 2016-07-01 6
# 2016-08-01 7
# 2016-09-01 8
# 2016-10-01 9
# 2016-11-01 10
# 2016-12-01 11
by_season = df.resample('QS-MAR').sum()
print(by_season)
# num
# 2015-12-01 1
# 2016-03-01 9
# 2016-06-01 18
# 2016-09-01 27
# 2016-12-01 11
您还可以在索引中制作更好,更具描述性的标签:
SEASONS = {
'winter': [12, 1, 2],
'spring': [3, 4, 5],
'summer': [6, 7, 8],
'fall': [9, 10, 11]
}
MONTHS = {month: season for season in SEASONS.keys()
for month in SEASONS[season]}
by_season.index = (pd.Series(by_season.index.month).map(MONTHS) +
' ' + by_season.index.year.astype(str))
print(by_season)
# num
# winter 2015 1
# spring 2016 9
# summer 2016 18
# fall 2016 27
# winter 2016 11