熊猫数据框,在NAN上重置总和

时间:2019-03-13 16:49:37

标签: python pandas dataframe

我知道这个问题存在,但是我找不到足够简单的答案来理解和解决我的问题。我在数据框中有一个列,我想保持此列的运行总计(总和),但要重设NAN值

 Index  s_number  s_cumsum
  0       1         1
  1       4         5
  2       6         11
  3       Nan       0
  4       7         7
  5       2         9
  6       3         12

2 个答案:

答案 0 :(得分:5)

使用groupbycumsum

df['s_cumsum'] = df.s_number.groupby(df.s_number.isna().cumsum()).cumsum()
df

   Index  s_number  s_cumsum
0      0       1.0       1.0
1      1       4.0       5.0
2      2       6.0      11.0
3      3       NaN       NaN
4      4       7.0       7.0
5      5       2.0       9.0
6      6       3.0      12.0

请注意,如果“ s_number”是一列字符串,请使用

df['s_number'] = pd.to_numeric(df['s_number'], errors='coerce)

...首先,获得带有NaN的浮点列。


如果要填写NaN,

df['s_cumsum'] = (df.s_number.groupby(df.s_number.isna().cumsum())
                    .cumsum()
                    .fillna(0, downcast='infer'))
df

   Index  s_number  s_cumsum
0      0       1.0         1
1      1       4.0         5
2      2       6.0        11
3      3       NaN         0
4      4       7.0         7
5      5       2.0         9
6      6       3.0        12

答案 1 :(得分:1)

将NaNs转换为以前值的负累积,然后该累积将在NaNs处将其重置为0。

我将df加倍以显示其工作原理。

    Error for Training job xgboost-2019-03-13-16-21-25-000: 
    Failed Reason: ClientError: Blankspace and colon not found in firstline 
'0.0,0.0,99.0,314.07,1.0,0.0,0.0,0.0,0.48027846,0.0...' of file 'train.csv'