对于以下df
,我想计算列Inst_Dist
的累积总和并另存为Cumu_Dist
,而WDir_Deg
的值保持不变。当WDir_Deg
中的值更改时,我需要重新启动累积和。
因此
index | WDir_Deg | Inst_Dist | Cumu_Dist
0 | 289 | 20 | NaN
1 | 285 | 17 | NaN
2 | 285 | 19 | NaN
3 | 287 | 19 | NaN
4 | 289 | 10 | NaN
成为
index | WDir_Deg | Inst_Dist | Cumu_Dist
0 | 289 | 20 | 20
1 | 285 | 17 | 17
2 | 285 | 19 | 36
3 | 287 | 19 | 19
4 | 289 | 10 | 10
我的非惯用(极慢)Python代码如下。如果有人可以指导我如何使代码更快且更惯用,我将不胜感激。
prev_angle = -1
curr_cumu_dist = 0
for curr_ind in df.index:
curr_angle = df.loc[curr_ind, 'WDir_Deg']
if prev_angle == curr_angle:
curr_cumu_dist += df.loc[curr_ind, 'Inst_Dist']
df.loc[curr_ind, 'Cumu_Dist'] = curr_cumu_dist
else:
prev_angle = curr_angle
curr_cumu_dist = df.loc[curr_ind, 'Inst_Dist']
df.loc[curr_ind, 'Cumu_Dist'] = curr_cumu_dist
答案 0 :(得分:0)
有点棘手。引用此问题/答案Pandas groupby cumulative sum
我提出了解决方案
df['Cumu_Dist'] = df.groupby('WDir_Deg').Inst_Dist.cumsum()
返回哪个
index WDir_Deg Inst_Dist Cumu_Dist
0 0 285 17 17
1 1 285 19 36
2 2 287 19 19
3 3 289 20 20
这使用pandas
版0.23.4
答案 1 :(得分:0)
对连续的组使用不等于ne
,shift
和cumsum
的带有比较Series
列的助手WDir_Deg
并将其传递给{{3} }:
s = df['WDir_Deg'].ne(df['WDir_Deg'].shift()).cumsum()
df['Cumu_Dist'] = df.groupby(s)['Inst_Dist'].cumsum()
print (df)
WDir_Deg Inst_Dist Cumu_Dist
0 289 20 20
1 285 17 17
2 285 19 36
3 287 19 19
4 289 10 10
详细信息:
print (s)
0 1
1 2
2 2
3 3
4 4
Name: WDir_Deg, dtype: int32