我知道这个问题存在,但是我找不到足够简单的答案来理解和解决我的问题。我在数据框中有一个列,我想保持此列的运行总计(总和),但要重设NAN值
Index s_number s_cumsum
0 1 1
1 4 5
2 6 11
3 Nan 0
4 7 7
5 2 9
6 3 12
答案 0 :(得分:5)
使用groupby
和cumsum
:
df['s_cumsum'] = df.s_number.groupby(df.s_number.isna().cumsum()).cumsum()
df
Index s_number s_cumsum
0 0 1.0 1.0
1 1 4.0 5.0
2 2 6.0 11.0
3 3 NaN NaN
4 4 7.0 7.0
5 5 2.0 9.0
6 6 3.0 12.0
请注意,如果“ s_number”是一列字符串,请使用
df['s_number'] = pd.to_numeric(df['s_number'], errors='coerce)
...首先,获得带有NaN的浮点列。
如果要填写NaN,
df['s_cumsum'] = (df.s_number.groupby(df.s_number.isna().cumsum())
.cumsum()
.fillna(0, downcast='infer'))
df
Index s_number s_cumsum
0 0 1.0 1
1 1 4.0 5
2 2 6.0 11
3 3 NaN 0
4 4 7.0 7
5 5 2.0 9
6 6 3.0 12
答案 1 :(得分:1)
将NaNs转换为以前值的负累积,然后该累积将在NaNs处将其重置为0。
我将df加倍以显示其工作原理。
Error for Training job xgboost-2019-03-13-16-21-25-000:
Failed Reason: ClientError: Blankspace and colon not found in firstline
'0.0,0.0,99.0,314.07,1.0,0.0,0.0,0.0,0.48027846,0.0...' of file 'train.csv'