我找到了一种按行解决此问题的解决方案,但是有一种快速的按列解决问题的方法吗?
以下是数据框的快速示例:
import pandas as pd
import numpy as np
df = pd.DataFrame([['GB',43.76],
['TEN',17.3],
['ARI',0.2],
['ATL',12.3],
['HOU',21.1],
['ARI',1.7],
['ATL',12.6],
['SF',15.0],
['GB',5.7],
[1.0,np.nan],
['GB',43.76],
['TEN',17.3],
['ARI',0.2],
['ATL',12.3],
['HOU',21.1],
['ARI',1.7],
['ATL',12.6],
['BUF',7.0],
['GB',5.7],
[2.0,np.nan]], columns = ['team','points'])
我一直在尝试操纵df['sum'] = df['points'].cumsum()
。显然,它可以累加总和,但是我需要做的是在/如果到达nan
时重新启动,而不仅仅是跳过它。
答案 0 :(得分:4)
将GroupBy.cumsum
与通过检查另一个cumsum
的缺失值创建的帮助器系列一起使用:
df['sum'] = df.groupby(df['points'].isna().cumsum())['points'].cumsum()
print (df)
team points sum
0 GB 43.76 43.76
1 TEN 17.30 61.06
2 ARI 0.20 61.26
3 ATL 12.30 73.56
4 HOU 21.10 94.66
5 ARI 1.70 96.36
6 ATL 12.60 108.96
7 SF 15.00 123.96
8 GB 5.70 129.66
9 1 NaN NaN
10 GB 43.76 43.76
11 TEN 17.30 61.06
12 ARI 0.20 61.26
13 ATL 12.30 73.56
14 HOU 21.10 94.66
15 ARI 1.70 96.36
16 ATL 12.60 108.96
17 BUF 7.00 115.96
18 GB 5.70 121.66
19 2 NaN NaN
答案 1 :(得分:1)
另一种无需使用groupby
并假设所有分数均为正值的方法,则可以对分数使用cumsum
,对ffill
进行nan运算。先前的值,然后删除值isna
所在位置的值cummax
,例如:
df['s'] = df['points'].cumsum().ffill()
df['s'] -= (df['s']*df['points'].isna()).cummax()
print (df)
team points s
0 GB 43.76 43.76
1 TEN 17.30 61.06
2 ARI 0.20 61.26
3 ATL 12.30 73.56
4 HOU 21.10 94.66
5 ARI 1.70 96.36
6 ATL 12.60 108.96
7 SF 15.00 123.96
8 GB 5.70 129.66
9 1 NaN 0.00
10 GB 43.76 43.76
11 TEN 17.30 61.06
12 ARI 0.20 61.26
13 ATL 12.30 73.56
14 HOU 21.10 94.66
15 ARI 1.70 96.36
16 ATL 12.60 108.96
17 BUF 7.00 115.96
18 GB 5.70 121.66
19 2 NaN 0.00
答案 2 :(得分:0)
不知道这是否与jezrael的解决方案相同,但是我建议创建一个代表求和组的列,如this question所示,在此您要检查np.nan而不是0。然后进行累加和在那些求和组上。