在忽略NAN的一列熊猫数据帧上执行累积总和

时间:2020-07-23 20:41:48

标签: python pandas

我有一个如下所示的熊猫数据框。我想基于每个ORDER在“ NEW1”列上执行累计总和。下面的代码部分起作用,但是它不会忽略Nan的意思,我希望最后一行的“ cumsum”值为8

import pandas as pd
import numpy as np
df = pd.DataFrame({'ORDER':["A", "A", "B", "B"], 'NEW1':[np.nan, 5, 8, np.nan]})
df['cumsum'] = df.groupby(['ORDER'])['NEW1'].cumsum()
df

    ORDER   NEW1    cumsum
0   A       NaN     NaN
1   A       5.0     5.0
2   B       8.0     8.0
3   B       NaN     NaN

我的预期输出:

    ORDER   NEW1    cumsum
0   A       NaN     NaN
1   A       5.0     5.0
2   B       8.0     8.0
3   B       NaN     8.0

3 个答案:

答案 0 :(得分:1)

您可能必须将apply与lambda一起使用:

df['cumsum'] = df.groupby(['ORDER'])['NEW1'].apply(lambda x: x.fillna(0).cumsum())

答案 1 :(得分:1)

fillna()groupby之前,并使用transform

df['cumsum']=df.fillna(0).groupby('ORDER')['NEW1'].transform('cumsum')



ORDER  NEW1  cumsum
0     A   NaN     0.0
1     A   5.0     5.0
2     B   8.0     8.0
3     B   NaN     8.0

答案 2 :(得分:1)

让我们使用expanding sum,它将按照您的喜好对待NaN

df['cumsum'] = df.groupby('ORDER')['NEW1'].expanding().sum().reset_index(0, drop=True)

  ORDER  NEW1  cumsum
0     A   NaN     NaN
1     A   5.0     5.0
2     B   8.0     8.0
3     B   NaN     8.0