熊猫groupby总和差异偏移累积总和

时间:2020-09-25 20:06:50

标签: python pandas

我有一张与此类似的桌子

import pandas as pd
data =  [['2019-02-01',0 ,5],
 ['2019-02-01',1, 12],
 ['2019-02-01',2,18],
 ['2019-02-01' ,3, 23],
 ['2019-02-01' ,4 ,20],
 ['2019-03-01',0 ,12],
 ['2019-03-01', 1,7],
 ['2019-03-01' ,2, 6],
 ['2019-03-01' ,3, 5],
 ['2019-03-01' ,4, 8]]
 df = pd.DataFrame(data, columns = ['Start_Month', 'Bucket','Complete']) 

我想要一个新列,其中每个start_Month都将计算complete的移位值的总和。就像第一个值将是2019-02-01的complete groupby start_Month Eg的总和是78,而下一个值即存储区1将是78-5 = 8 = 73(5是存储区0的完整值)相同的start_month的2将是78-5-12 = 61,如下面的带值的一个,但在显示计算的图片中。

enter image description here

df['new_Com']=df.groupby(['Start_Month']).Complete.sum() - df.groupby(['Start_Month']).Complete.shift(1).cumsum().fillna(0).astype(int) 

这行不通。

1 个答案:

答案 0 :(得分:2)

尝试颠倒顺序,然后cumsum

df['New'] = df.iloc[::-1].groupby('Start_Month').Complete.cumsum()
df
  Start_Month  Bucket  Complete  New
0  2019-02-01       0         5   78
1  2019-02-01       1        12   73
2  2019-02-01       2        18   61
3  2019-02-01       3        23   43
4  2019-02-01       4        20   20
5  2019-03-01       0        12   38
6  2019-03-01       1         7   26
7  2019-03-01       2         6   19
8  2019-03-01       3         5   13
9  2019-03-01       4         8    8