groupby并在特定月份范围内汇总(11月至6月)

时间:2018-08-16 17:29:16

标签: python pandas datetime

我的数据集如下:

       time             Value
2006-09-15 00:00:00      1.27
2006-09-16 00:00:00        0
2006-09-17 00:00:00        0
2006-09-18 00:00:00     1.016
2006-09-19 00:00:00      5.08
2006-09-20 00:00:00      0.16
2006-09-21 00:00:00      3.81

我想知道哪种方法是对11月至6月而不是日历年进行groupby.sum的最佳方法。

1 个答案:

答案 0 :(得分:0)

假设您的时间列为datetime64[ns],则只需按月份和总和过滤数据框即可。根据您的问题,您不必关心按年份分组。

# create sample dataframe
df = pd.DataFrame({'Value': {0: 1.27, 1: 0.0, 2: 0.0, 3: 1.016, 4: 5.08, 5: 0.16, 6: 3.81},
 'time': {0: '2006-09-15 00:00:00',
  1: '2006-11-16 00:00:00',
  2: '2006-11-17 00:00:00',
  3: '2006-12-18 00:00:00',
  4: '2006-01-19 00:00:00',
  5: '2006-02-20 00:00:00',
  6: '2006-09-21 00:00:00'}})

# make sure column is datetime
df['time'] = pd.to_datetime(df['time'])

# filter by month using .isin()
df[df['time'].apply(lambda x: x.month).isin([11,12,1,2,3,4,5,6])]['Value'].sum()

或者,如果要按年份和月份在该范围内分组,则需要添加“年份和月份”列并对其进行分组:

# create sample dataframe
df = pd.DataFrame({'Value': {0: 1.27, 1: 1.0, 2: 1.06, 3: 1.016, 4: 5.08, 5: 0.16, 6: 3.81},
 'time': {0: '2006-09-15 00:00:00',
  1: '2006-11-16 00:00:00',
  2: '2006-11-17 00:00:00',
  3: '2006-12-18 00:00:00',
  4: '2006-01-19 00:00:00',
  5: '2006-02-20 00:00:00',
  6: '2007-02-21 00:00:00'}})

# make sure time is in fact datetime
df['time'] = pd.to_datetime(df['time'])

# create month and year columns to group on
df['Month'] = df['time'].map(lambda x: x.month)
df['Year'] = df['time'].map(lambda x: x.year)

# filter dataframe for your month range
df2 = df[df['Month'].isin([11,12,1,2,3,4,5,6])]

# groupby and sum value
df2.groupby(['Year','Month'])['Value'].sum()