熊猫通过按SQL等效项按分区求和

时间:2019-07-06 16:54:37

标签: python sql pandas pandas-groupby

我正在寻找一种方法,可以基于特定分区(以等价于

)聚合(以熊猫为单位)值的子集
select table.*,
       sum(income) over (order by id, num_yyyymm rows between 3 preceding and 1 preceding) as prev_income_3,
       sum(income) over (order by id, num_yyyymm rows between 1 following and 3 following) as next_income_3
from table order by a.id_customer, num_yyyymm;

我尝试了以下解决方案,但存在一些问题: 1)需要花些时间才能完成 2)我必须合并所有结果,

for x, y in df.groupby(['id_customer']):
    print(y[['num_yyyymm', 'income']])
    y['next3'] = y['income'].iloc[::-1].rolling(3).sum()
    print(y[['num_yyyymm', 'income', 'next3']])
    break

结果:

        num_yyyymm       income     next3
0       201501         0.00     0.00
1       201502         0.00     0.00
2       201503         0.00     0.00
3       201504         0.00     0.00
4       201505         0.00     0.00
5       201506         0.00     0.00
6       201507         0.00     0.00
7       201508         0.00     0.00
8       201509         0.00     0.00
9       201510         0.00     0.00
10      201511         0.00     0.00
11      201512         0.00     0.00
12      201601         0.00     0.00
13      201602         0.00     0.00
14      201603         0.00     0.00
15      201604         0.00     0.00
16      201605         0.00     0.00
17      201606         0.00     0.00
18      201607         0.00     0.00
19      201608         0.00     0.00
20      201609         0.00  1522.07
21      201610         0.00  1522.07
22      201611         0.00  1522.07
23      201612      1522.07     0.00
24      201701         0.00    -0.00
25      201702         0.00     1.52
26      201703         0.00  1522.07
27      201704         0.00  1522.07
28      201705         1.52  1520.55
29      201706      1520.55     0.00
30      201707         0.00      NaN
31      201708         0.00      NaN
32      201709         0.00      NaN

有人有替代解决方案吗?

0 个答案:

没有答案