我正在寻找一种方法,可以基于特定分区(以等价于
)聚合(以熊猫为单位)值的子集select table.*,
sum(income) over (order by id, num_yyyymm rows between 3 preceding and 1 preceding) as prev_income_3,
sum(income) over (order by id, num_yyyymm rows between 1 following and 3 following) as next_income_3
from table order by a.id_customer, num_yyyymm;
我尝试了以下解决方案,但存在一些问题: 1)需要花些时间才能完成 2)我必须合并所有结果,
for x, y in df.groupby(['id_customer']):
print(y[['num_yyyymm', 'income']])
y['next3'] = y['income'].iloc[::-1].rolling(3).sum()
print(y[['num_yyyymm', 'income', 'next3']])
break
结果:
num_yyyymm income next3
0 201501 0.00 0.00
1 201502 0.00 0.00
2 201503 0.00 0.00
3 201504 0.00 0.00
4 201505 0.00 0.00
5 201506 0.00 0.00
6 201507 0.00 0.00
7 201508 0.00 0.00
8 201509 0.00 0.00
9 201510 0.00 0.00
10 201511 0.00 0.00
11 201512 0.00 0.00
12 201601 0.00 0.00
13 201602 0.00 0.00
14 201603 0.00 0.00
15 201604 0.00 0.00
16 201605 0.00 0.00
17 201606 0.00 0.00
18 201607 0.00 0.00
19 201608 0.00 0.00
20 201609 0.00 1522.07
21 201610 0.00 1522.07
22 201611 0.00 1522.07
23 201612 1522.07 0.00
24 201701 0.00 -0.00
25 201702 0.00 1.52
26 201703 0.00 1522.07
27 201704 0.00 1522.07
28 201705 1.52 1520.55
29 201706 1520.55 0.00
30 201707 0.00 NaN
31 201708 0.00 NaN
32 201709 0.00 NaN
有人有替代解决方案吗?