使用df.fillna在列顶部应用前向填充NaN?

时间:2017-08-14 08:21:37

标签: python pandas data-science

这是我的数据框架

Id_Student  English History Mathmatic

1   66.0    NaN         80.0
2   NaN     66.0        NaN
3   NaN     NaN         NaN
4   55.0    94.0        94.0

我想用这个方法修复缺失值

mdf1 = mdf.fillna(method='ffill')

但看起来如果第一个值是NaN则没有多大帮助。 “历史记录”列下的第一个值仍为NaN

Id_Student  English History Mathmatic

1       66.0        NaN      80.0
2       66.0       66.0      80.0
3       66.0       66.0      80.0
4       55.0       94.0      94.0
5       55.0       85.0      85.0

有任何想法来解决这类问题 干杯队友

1 个答案:

答案 0 :(得分:3)

我认为这是正常行为,因为ffill通过前向填充替换NaN,如果第一行没有值,则只将NaNs获取到第一个非NaN值。

您可以使用另一个fillna替换NaNs,而ffill无法替换mdf1 = mdf.ffill().fillna(0) #same as #mdf1 = mdf.fillna(method='ffill').fillna(0)

bfill

同样的问题是NaN(回填)和最后一行fillna的值,然后可以添加print (mdf) Id_Student English History Mathmatic 0 1 66.0 NaN NaN 1 2 NaN 66.0 NaN 2 3 NaN NaN NaN 3 4 55.0 94.0 94.0 4 5 NaN 10.0 NaN 5 6 NaN NaN 20.0 print (mdf.ffill()) Id_Student English History Mathmatic 0 1 66.0 NaN NaN 1 2 66.0 66.0 NaN 2 3 66.0 66.0 NaN 3 4 55.0 94.0 94.0 4 5 55.0 10.0 94.0 5 6 55.0 10.0 20.0 print (mdf.bfill()) Id_Student English History Mathmatic 0 1 66.0 66.0 94.0 1 2 55.0 66.0 94.0 2 3 55.0 94.0 94.0 3 4 55.0 94.0 94.0 4 5 NaN 10.0 20.0 5 6 NaN NaN 20.0 或其他方法:

mdf1 = mdf.ffill().fillna(0)
print (mdf1)
   Id_Student  English  History  Mathmatic
0           1     66.0      0.0        0.0
1           2     66.0     66.0        0.0
2           3     66.0     66.0        0.0
3           4     55.0     94.0       94.0
4           5     55.0     10.0       94.0
5           6     55.0     10.0       20.0


mdf1 = mdf.bfill().fillna(0)
print (mdf1)
   Id_Student  English  History  Mathmatic
0           1     66.0     66.0       94.0
1           2     55.0     66.0       94.0
2           3     55.0     94.0       94.0
3           4     55.0     94.0       94.0
4           5      0.0     10.0       20.0
5           6      0.0      0.0       20.0

用标量替换所有NaN:

ffill

替换为其他方法 - 如果先是bfill,那么mdf1 = mdf.ffill().bfill() print (mdf1) Id_Student English History Mathmatic 0 1 66.0 66.0 94.0 1 2 66.0 66.0 94.0 2 3 66.0 66.0 94.0 3 4 55.0 94.0 94.0 4 5 55.0 10.0 94.0 5 6 55.0 10.0 20.0 mdf1 = mdf.bfill().ffill() print (mdf1) Id_Student English History Mathmatic 0 1 66.0 66.0 94.0 1 2 55.0 66.0 94.0 2 3 55.0 94.0 94.0 3 4 55.0 94.0 94.0 4 5 55.0 10.0 20.0 5 6 55.0 10.0 20.0

Python