这是我的数据框架
Id_Student English History Mathmatic
1 66.0 NaN 80.0
2 NaN 66.0 NaN
3 NaN NaN NaN
4 55.0 94.0 94.0
我想用这个方法修复缺失值
mdf1 = mdf.fillna(method='ffill')
但看起来如果第一个值是NaN则没有多大帮助。 “历史记录”列下的第一个值仍为NaN
Id_Student English History Mathmatic
1 66.0 NaN 80.0
2 66.0 66.0 80.0
3 66.0 66.0 80.0
4 55.0 94.0 94.0
5 55.0 85.0 85.0
有任何想法来解决这类问题 干杯队友
答案 0 :(得分:3)
我认为这是正常行为,因为ffill
通过前向填充替换NaN
,如果第一行没有值,则只将NaNs
获取到第一个非NaN值。
您可以使用另一个fillna
替换NaNs
,而ffill
无法替换mdf1 = mdf.ffill().fillna(0)
#same as
#mdf1 = mdf.fillna(method='ffill').fillna(0)
:
bfill
同样的问题是NaN
(回填)和最后一行fillna
的值,然后可以添加print (mdf)
Id_Student English History Mathmatic
0 1 66.0 NaN NaN
1 2 NaN 66.0 NaN
2 3 NaN NaN NaN
3 4 55.0 94.0 94.0
4 5 NaN 10.0 NaN
5 6 NaN NaN 20.0
print (mdf.ffill())
Id_Student English History Mathmatic
0 1 66.0 NaN NaN
1 2 66.0 66.0 NaN
2 3 66.0 66.0 NaN
3 4 55.0 94.0 94.0
4 5 55.0 10.0 94.0
5 6 55.0 10.0 20.0
print (mdf.bfill())
Id_Student English History Mathmatic
0 1 66.0 66.0 94.0
1 2 55.0 66.0 94.0
2 3 55.0 94.0 94.0
3 4 55.0 94.0 94.0
4 5 NaN 10.0 20.0
5 6 NaN NaN 20.0
或其他方法:
mdf1 = mdf.ffill().fillna(0)
print (mdf1)
Id_Student English History Mathmatic
0 1 66.0 0.0 0.0
1 2 66.0 66.0 0.0
2 3 66.0 66.0 0.0
3 4 55.0 94.0 94.0
4 5 55.0 10.0 94.0
5 6 55.0 10.0 20.0
mdf1 = mdf.bfill().fillna(0)
print (mdf1)
Id_Student English History Mathmatic
0 1 66.0 66.0 94.0
1 2 55.0 66.0 94.0
2 3 55.0 94.0 94.0
3 4 55.0 94.0 94.0
4 5 0.0 10.0 20.0
5 6 0.0 0.0 20.0
用标量替换所有NaN:
ffill
替换为其他方法 - 如果先是bfill
,那么mdf1 = mdf.ffill().bfill()
print (mdf1)
Id_Student English History Mathmatic
0 1 66.0 66.0 94.0
1 2 66.0 66.0 94.0
2 3 66.0 66.0 94.0
3 4 55.0 94.0 94.0
4 5 55.0 10.0 94.0
5 6 55.0 10.0 20.0
mdf1 = mdf.bfill().ffill()
print (mdf1)
Id_Student English History Mathmatic
0 1 66.0 66.0 94.0
1 2 55.0 66.0 94.0
2 3 55.0 94.0 94.0
3 4 55.0 94.0 94.0
4 5 55.0 10.0 20.0
5 6 55.0 10.0 20.0
:
Python