我有一个熊猫数据框,如下所示:
>>> df.head()
timestamp count_200 count_201 count_503 count_504 mean_200 mean_201 mean_503 mean_504 count_500
0 2020-09-18 09:00:00 4932.0 51.0 NaN NaN 59.501014 73.941176 0.0 0.0 0
1 2020-09-18 10:00:00 1697.0 9.0 NaN NaN 57.807896 69.111111 0.0 0.0 0
2 2020-09-18 11:00:00 6895.0 6.0 2.0 1.0 54.037273 98.333333 33.0 1511.0 0
3 2020-09-18 12:00:00 2943.0 97.0 NaN NaN 74.334353 74.268041 0.0 0.0 0
4 2020-09-18 13:00:00 2299.0 43.0 NaN NaN 70.539800 102.302326 0.0 0.0 0
fillna不能代替NaN
>>> df.fillna(0)
timestamp count_200 count_201 count_503 count_504 mean_200 mean_201 mean_503 mean_504 count_500
0 2020-09-18 09:00:00 4932.0 51.0 NaN NaN 59.501014 73.941176 0.000000 0.000 0
1 2020-09-18 10:00:00 1697.0 9.0 NaN NaN 57.807896 69.111111 0.000000 0.000 0
2 2020-09-18 11:00:00 6895.0 6.0 2.0 1.0 54.037273 98.333333 33.000000 1511.000 0
3 2020-09-18 12:00:00 2943.0 97.0 NaN NaN 74.334353 74.268041 0.000000 0.000 0
4 2020-09-18 13:00:00 2299.0 43.0 NaN NaN 70.539800 102.302326 0.000000 0.000 0
但是,如果我们仅访问一行,则所得系列的fillna可以按预期工作:
>>> df.iloc[0]
timestamp 2020-09-18 09:00:00
count_200 4932
count_201 51
count_503 NaN
count_504 NaN
mean_200 59.501
mean_201 73.9412
mean_503 0
mean_504 0
count_500 0
Name: 0, dtype: object
>>> df.iloc[0].fillna(0)
timestamp 2020-09-18 09:00:00
count_200 4932
count_201 51
count_503 0
count_504 0
mean_200 59.501
mean_201 73.9412
mean_503 0
mean_504 0
count_500 0
Name: 0, dtype: object
这是怎么回事?
>>> df.iloc[0,3]
nan
>>> type(df.iloc[0,3])
<class 'numpy.float64'>
Pandas识别为na:
>>> df.isna()
timestamp count_200 count_201 count_503 count_504 mean_200 mean_201 mean_503 mean_504 count_500
0 False False False True True False False False False False
1 False False False True True False False False False False
2 False False False False False False False False False False
3 False False False True True False False False False False
4 False False False True True False False False False False
但是使用numpys inbuild函数,可以在熊猫中修复它:
>>> df.head().apply(np.nan_to_num)
timestamp count_200 count_201 count_503 count_504 mean_200 mean_201 mean_503 mean_504 count_500
0 2020-09-18 09:00:00 4932.0 51.0 0.0 0.0 59.501014 73.941176 0.0 0.0 0
1 2020-09-18 10:00:00 1697.0 9.0 0.0 0.0 57.807896 69.111111 0.0 0.0 0
2 2020-09-18 11:00:00 6895.0 6.0 2.0 1.0 54.037273 98.333333 33.0 1511.0 0
3 2020-09-18 12:00:00 2943.0 97.0 0.0 0.0 74.334353 74.268041 0.0 0.0 0
4 2020-09-18 13:00:00 2299.0 43.0 0.0 0.0 70.539800 102.302326 0.0 0.0 0
这是预期的,我找不到此文档。我想念什么?这是错误吗?
答案 0 :(得分:2)
df.head()
timestamp count_200 count_201 count_503 count_504 mean_200 mean_201 mean_503 mean_504 count_500
0 2020-09-18 09:00:00 4932.0 51.0 NaN NaN 59.501014 73.941176 0.0 0.0 0
1 2020-09-18 10:00:00 1697.0 9.0 NaN NaN 57.807896 69.111111 0.0 0.0 0
2 2020-09-18 11:00:00 6895.0 6.0 2.0 1.0 54.037273 98.333333 33.0 1511.0 0
3 2020-09-18 12:00:00 2943.0 97.0 NaN NaN 74.334353 74.268041 0.0 0.0 0
4 2020-09-18 13:00:00 2299.0 43.0 NaN NaN 70.539800 102.302326 0.0 0.0 0
将NaN
替换为0
df.fillna(0)
timestamp count_200 count_201 count_503 count_504 mean_200 mean_201 mean_503 mean_504 count_500
0 2020-09-18 09:00:00 4932.0 51.0 0.0 0.0 59.501014 73.941176 0.0 0.0 0
1 2020-09-18 10:00:00 1697.0 9.0 0.0 0.0 57.807896 69.111111 0.0 0.0 0
2 2020-09-18 11:00:00 6895.0 6.0 2.0 1.0 54.037273 98.333333 33.0 1511.0 0
3 2020-09-18 12:00:00 2943.0 97.0 0.0 0.0 74.334353 74.268041 0.0 0.0 0
4 2020-09-18 13:00:00 2299.0 43.0 0.0 0.0 70.539800 102.302326 0.0 0.0 0
对我来说很好。
使用inplace=True
将更改应用于数据框
df.fillna(0, inplace=True)
我正在使用的熊猫版本是
print(pd.__version__)
0.23.0
请重新启动IDE / python内核
检查并更新熊猫版本(如果需要)
答案 1 :(得分:1)
df[df.isna().any()] = 0
您可以使用它,pandas lib可能会使您感到困惑,因为对于一种功能,您可以执行许多类型的事情,我通常会尽一切努力,不要卡在其中,告诉我这是否起作用或至少在做什么
答案 2 :(得分:0)
我似乎无法重新创建该错误,如果我复制您提供的df并使用pd.read_clipboard()
将其转换为df,则df.fillna(0)
会为我提供预期的结果。
当您提供df.fillna(0)的收益时,这是实际收益吗?或者您正在打印df。如果是这样,请记住使用inplace=True
参数。