我有两家年终不同的公司(1/31和12/31),我希望得到各自季度的指标平均值。在这个例子中,我为两家公司创建了2016-2017的8个季度结束日期的DataFrame:
comp1 = pd.date_range('1/31/2016', periods=8, freq='3M')
comp2 = pd.date_range('1/31/2016', periods=8, freq='Q')
quarters = pd.DataFrame([1] * 8 + [2] * 8, index=comp1.append(comp2), columns=['company'])
这里有数据,我有两个值(A和B),在2016和2017年的每个月的随机日期测量:
values = np.transpose([np.arange(1, 25), np.arange(1, 25) * 11])
dates = ['2016-01-14', '2016-02-03', '2016-03-15', '2016-04-04',
'2016-05-30', '2016-06-11', '2016-07-18', '2016-08-08',
'2016-09-09', '2016-10-10', '2016-11-01', '2016-12-24',
'2017-01-30', '2017-02-19', '2017-03-13', '2017-04-24',
'2017-05-31', '2017-06-02', '2017-07-28', '2017-08-23',
'2017-09-04', '2017-10-30', '2017-11-11', '2017-12-06']
df = pd.DataFrame(values, index=pd.DatetimeIndex(dates), columns=['A', 'B'])
数据如下所示:
A B
2016-01-14 1 11
2016-02-03 2 22
2016-03-15 3 33
2016-04-04 4 44
2016-05-30 5 55
2016-06-11 6 66
2016-07-18 7 77
2016-08-08 8 88
2016-09-09 9 99
2016-10-10 10 110
2016-11-01 11 121
2016-12-24 12 132
2017-01-30 13 143
2017-02-19 14 154
2017-03-13 15 165
2017-04-24 16 176
2017-05-31 17 187
2017-06-02 18 198
2017-07-28 19 209
2017-08-23 20 220
2017-09-04 21 231
2017-10-30 22 242
2017-11-11 23 253
2017-12-06 24 264
这是我想要的结果,按季度分组并平均每个季度内的值:
company A B
2016-01-31 1 1 11
2016-04-30 1 3 33
2016-07-31 1 6 66
2016-10-31 1 9 99
2017-01-31 1 12 132
2017-04-30 1 15 165
2017-07-31 1 18 198
2017-10-31 1 21 231
2016-03-31 2 2 22
2016-06-30 2 5 55
2016-09-30 2 8 88
2016-12-31 2 11 121
2017-03-31 2 14 154
2017-06-30 2 17 187
2017-09-30 2 20 220
2017-12-31 2 23 253
答案 0 :(得分:3)
您可以按季度重新采样日期时间指数,并计算该期间的平均值。
df.resample('Q-JAN', convention='end').agg('mean')
您还可以在公司进行groupby操作:
df.groupby('company').resample('Q-JAN', convention='end').agg('mean')
答案 1 :(得分:1)
@iDrwish had responded with:
df.resample('Q', convention='end').agg('mean')
This works for the December year-end company, and a simple change (Q
to Q-JAN
) gets results for the January year-end company:
df.resample('Q-JAN', convention='end').agg('mean')
答案 2 :(得分:0)
让我们假设您的DataFrame具有“ date_of_order”列。最简单的方法是:
df['date_of_order'] = pd.to_datetime(df['date_of_order']) # if you haven't converted it already
df.groupby(df['date_of_order'].dt.to_period('Q'))['column to aggregate'].agg(...)