python熊猫cumsum与groupby和条件

时间:2020-05-16 16:38:09

标签: python pandas pandas-groupby cumsum

我有这个数据框

In [1]: import pandas as pd                                                                                                                                                                                                                                                                                       

In [2]: data = pd.DataFrame({'ID': ['A', 'A', 'A', 'A', 'B', 'B', 'B'], 'Tag': ['X', '', 'X', '', 'X', '',''], 'Counts': [1,3,5,2,3,2,1]})                                                                                                                                                                        

In [3]: data                                                                                                                                                                                                                                                                                                      
Out[3]: 
  ID Tag  Counts
0  A   X       1
1  A           3
2  A   X       5
3  A           2
4  B   X       3
5  B           2
6  B           1

我想创建一个新的列,其总和按列= ID分组,但是如果列Tag = X,则重新开始求和

In [6]: data['before'] = data.groupby(['ID']).Counts.cumsum()                                                                                                                                                                                                                                                     

In [7]: data['after'] = [1,4,5,7,3,5,6]                                                                                                                                                                                                                                                                           

In [8]: data                                                                                                                                                                                                                                                                                                      
Out[8]: 
  ID Tag  Counts  before  after
0  A   X       1       1      1
1  A           3       4      4
2  A   X       5       9      5
3  A           2      11      7
4  B   X       3       3      3
5  B           2       5      5
6  B           1       6      6

我想在“之后”获取列

1 个答案:

答案 0 :(得分:3)

您可以使用.eq('X').cumsum()来标识以X开头的组,您可以在groupby中将其与'ID'一起使用:

data['after'] = data.groupby(['ID',data.Tag.eq('X').cumsum()])['Counts'].cumsum()

输出:

  ID Tag  Counts  after
0  A   X       1      1
1  A           3      4
2  A   X       5      5
3  A           2      7
4  B   X       3      3
5  B           2      5
6  B           1      6