我的数据集如下:
time Value
2006-09-15 00:00:00 1.27
2006-09-16 00:00:00 0
2006-09-17 00:00:00 0
2006-09-18 00:00:00 1.016
2006-09-19 00:00:00 5.08
2006-09-20 00:00:00 0.16
2006-09-21 00:00:00 3.81
我想知道哪种方法是对11月至6月而不是日历年进行groupby.sum的最佳方法。
答案 0 :(得分:0)
假设您的时间列为datetime64[ns]
,则只需按月份和总和过滤数据框即可。根据您的问题,您不必关心按年份分组。
# create sample dataframe
df = pd.DataFrame({'Value': {0: 1.27, 1: 0.0, 2: 0.0, 3: 1.016, 4: 5.08, 5: 0.16, 6: 3.81},
'time': {0: '2006-09-15 00:00:00',
1: '2006-11-16 00:00:00',
2: '2006-11-17 00:00:00',
3: '2006-12-18 00:00:00',
4: '2006-01-19 00:00:00',
5: '2006-02-20 00:00:00',
6: '2006-09-21 00:00:00'}})
# make sure column is datetime
df['time'] = pd.to_datetime(df['time'])
# filter by month using .isin()
df[df['time'].apply(lambda x: x.month).isin([11,12,1,2,3,4,5,6])]['Value'].sum()
或者,如果要按年份和月份在该范围内分组,则需要添加“年份和月份”列并对其进行分组:
# create sample dataframe
df = pd.DataFrame({'Value': {0: 1.27, 1: 1.0, 2: 1.06, 3: 1.016, 4: 5.08, 5: 0.16, 6: 3.81},
'time': {0: '2006-09-15 00:00:00',
1: '2006-11-16 00:00:00',
2: '2006-11-17 00:00:00',
3: '2006-12-18 00:00:00',
4: '2006-01-19 00:00:00',
5: '2006-02-20 00:00:00',
6: '2007-02-21 00:00:00'}})
# make sure time is in fact datetime
df['time'] = pd.to_datetime(df['time'])
# create month and year columns to group on
df['Month'] = df['time'].map(lambda x: x.month)
df['Year'] = df['time'].map(lambda x: x.year)
# filter dataframe for your month range
df2 = df[df['Month'].isin([11,12,1,2,3,4,5,6])]
# groupby and sum value
df2.groupby(['Year','Month'])['Value'].sum()