我有一个如下所示的熊猫数据框。我想基于每个ORDER在“ NEW1”列上执行累计总和。下面的代码部分起作用,但是它不会忽略Nan的意思,我希望最后一行的“ cumsum”值为8
import pandas as pd
import numpy as np
df = pd.DataFrame({'ORDER':["A", "A", "B", "B"], 'NEW1':[np.nan, 5, 8, np.nan]})
df['cumsum'] = df.groupby(['ORDER'])['NEW1'].cumsum()
df
ORDER NEW1 cumsum
0 A NaN NaN
1 A 5.0 5.0
2 B 8.0 8.0
3 B NaN NaN
我的预期输出:
ORDER NEW1 cumsum
0 A NaN NaN
1 A 5.0 5.0
2 B 8.0 8.0
3 B NaN 8.0
答案 0 :(得分:1)
您可能必须将apply与lambda一起使用:
df['cumsum'] = df.groupby(['ORDER'])['NEW1'].apply(lambda x: x.fillna(0).cumsum())
答案 1 :(得分:1)
fillna()
在groupby
之前,并使用transform
df['cumsum']=df.fillna(0).groupby('ORDER')['NEW1'].transform('cumsum')
ORDER NEW1 cumsum
0 A NaN 0.0
1 A 5.0 5.0
2 B 8.0 8.0
3 B NaN 8.0
答案 2 :(得分:1)
让我们使用expanding
sum
,它将按照您的喜好对待NaN
:
df['cumsum'] = df.groupby('ORDER')['NEW1'].expanding().sum().reset_index(0, drop=True)
ORDER NEW1 cumsum
0 A NaN NaN
1 A 5.0 5.0
2 B 8.0 8.0
3 B NaN 8.0