我有一个DataFrame .groupby().cumsum(),其DataFrame如下:
Col_A Col_B Col_C
1 A 0
2 A 1 1
3 A 1 2
4 A 1 3
5 B 0 0
6 B 1 1
7 B 0
8 B 1 2
9 C 1 1
10 C 1 2
11 C 1 3
12 C 0
Col_B的总和为df.groupby(['Col_A'])['Col_B'].cumsum()
。但是,当Col_B == 0时,.cumsum()为空。即使Col_B为空白,如何记录.cumsum()
?
生成的DataFrame应该类似于:
Col_A Col_B Col_C
1 A 0 0
2 A 1 1
3 A 1 2
4 A 1 3
5 B 0 0
6 B 1 1
7 B 0 1
8 B 1 2
9 C 1 1
10 C 1 2
11 C 1 3
12 C 0 3
答案 0 :(得分:1)
我认为您需要先按boolean indexing
或query
进行过滤:
df['Col_C'] = df[df['Col_B'] != 0].groupby(['Col_A'])['Col_B'].cumsum()
print (df)
Col_A Col_B Col_C
1 A 0 NaN
2 A 1 1.0
3 A 1 2.0
4 A 1 3.0
5 B 0 NaN
6 B 1 1.0
7 B 0 NaN
8 B 1 2.0
9 C 1 1.0
10 C 1 2.0
11 C 1 3.0
12 C 0 NaN
或者:
df['Col_C'] = df.query('Col_B != 0').groupby(['Col_A'])['Col_B'].cumsum()
print (df)
Col_A Col_B Col_C
1 A 0 NaN
2 A 1 1.0
3 A 1 2.0
4 A 1 3.0
5 B 0 NaN
6 B 1 1.0
7 B 0 NaN
8 B 1 2.0
9 C 1 1.0
10 C 1 2.0
11 C 1 3.0
12 C 0 NaN
最后用NaN
(fillna
和method ='ffill')替换ffill
。但是,获取第一个值仍为NaN
,由fillna
替换,最后一个转换为int
:
df['Col_C'] = df['Col_C'].ffill().fillna(0).astype(int)
print (df)
Col_A Col_B Col_C
1 A 0 0
2 A 1 1
3 A 1 2
4 A 1 3
5 B 0 3
6 B 1 1
7 B 0 1
8 B 1 2
9 C 1 1
10 C 1 2
11 C 1 3
12 C 0 3
答案 1 :(得分:1)
列为0与具有完全空白列的列不同。 如果您在列中有NA,则该列的.cumsum()实际上应该是NA(或者如您所说的那样空白')。 您可以检查整列是否为NA并相应地设置值。
DataFrame.cumsum(axis=None, skipna=True, *args, **kwargs)
Return cumulative sum over requested axis.
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA