Question

我想问一个关于pandas / python的问题。让我们说我有两列。我想找到第一列的值的累积和，直到第二列的值达到特定值为止。我认为，仅通过一个小例子来解释这个问题是一个更好的选择。

我有

我要

：

 A.   B.   C
 1.   0.   1
 2.   0.   3 (1+2)
 2.   0.   5 (1+2+2)
 1.   1.   6 (1+2+2+1) So, cumulative sum should stop here, because B reaches 1 now. 
 2.   0.   2 So, cumulative sum should begin again.
 3.   0.   5 (2+3)
 3.   0.   8 (2+3+3)
 5.   1.   13 (2+3+3+5) So, cumulative sum should stop again, because B reaches 1 again.

预先感谢您的帮助。

Answer 1

将DataFrameGroupBy.cumsum与另一个cumsum用于组：

df['C'] = df.groupby(df['B'].eq(1).iloc[::-1].cumsum())['A'].cumsum()
#if only 0 and 1 values in B
#df['C'] = df.groupby(df['B'].iloc[::-1].cumsum())['A'].cumsum()
print (df)
   A  B   C
0  1  0   1
1  2  0   3
2  2  0   5
3  1  1   6
4  2  0   2
5  3  0   5
6  3  0   8
7  5  1  13

详细信息：

通过1进行比较，并通过使用iloc进行索引来更改顺序：

print (df['B'].eq(1).iloc[::-1])
7     True
6    False
5    False
4    False
3     True
2    False
1    False
0    False
Name: B, dtype: bool

通过Series.cumsum创建群组：

print (df['B'].iloc[::-1].cumsum())
7    1
6    1
5    1
4    1
3    2
2    2
1    2
0    2
Name: B, dtype: int64

特定条件下的值之和

1 个答案: