Pandas hasnans为包含NaN值的列返回错误的值

时间:2019-05-14 16:34:52

标签: python pandas

我有一个大约有一个DataFrame。 200列,7000行列B完全由NaN值组成,但中间约有400行。

总而言之,B列看起来像这样(为了简洁起见):

      B
 1  NaN
 2  NaN
 3   75
 4   83
 5  NaN
 6  NaN

但是,当我编写如下代码时,hasnans属性似乎具有错误的值。我是不正确地使用了属性还是什么?

df['B'].hasnans

返回 False

编辑: 以下是我导入熊猫的CSV文件的小样本。该列仍然找不到NaN值。机敏的观察者会注意到列标题中B周围的空格。那是预料之中的,而不是问题。

"  DATE       TIME  ","  A  ","  C  ","  B  "
12/11/2018 15:44:36,     5448,     0.00,      NaN
12/11/2018 15:44:36,     5448,     0.00,      NaN
12/11/2018 15:44:36,     5448,     0.00,      NaN
12/11/2018 15:44:36,     5448,     0.00,      NaN
12/11/2018 15:45:07,     5448,     0.00,      NaN
12/11/2018 15:45:08,     5448,     0.00,      NaN
12/11/2018 15:45:08,     5448,     0.00,      NaN
12/11/2018 15:45:09,     5448,     0.00,      NaN
12/11/2018 15:45:09,     5448,     0.00,      NaN

3 个答案:

答案 0 :(得分:1)

考虑

Vue.mixin(titleMixin)

作为要作为熊猫数据框导入的.csv文件,您必须注意要查找的实际值。

事实上:

"  DATE       TIME  ","  A  ","  C  ","  B  "
12/11/2018 15:44:36,     5448,     0.00,      NaN
12/11/2018 15:44:36,     5448,     0.00,      NaN
12/11/2018 15:44:36,     5448,     0.00,      NaN
12/11/2018 15:44:36,     5448,     0.00,      NaN
12/11/2018 15:45:07,     5448,     0.00,      NaN
12/11/2018 15:45:08,     5448,     0.00,      NaN
12/11/2018 15:45:08,     5448,     0.00,      NaN
12/11/2018 15:45:09,     5448,     0.00,      NaN
12/11/2018 15:45:09,     5448,     0.00,      NaN

返回:

import pandas as pd
import numpy as np

df = pd.read_csv('filename.csv', header=0)

df['  B  '].replace('      NaN', np.nan, inplace=True)
df['  B  '].hasnans

答案 1 :(得分:1)

当您读入csv时,应使用skipinitialspace选项删除数据中的前导空格。请注意,由于列名用引号引起来,所以它们周围的空格将保留

# make fake csv
from io import StringIO

mock_csv = StringIO()
mock_csv.write("""\
"  DATE       TIME  ","  A  ","  C  ","  B  "
12/11/2018 15:44:36,     5448,     0.00,      NaN
12/11/2018 15:44:36,     5448,     0.00,      NaN
12/11/2018 15:44:36,     5448,     0.00,      NaN
12/11/2018 15:44:36,     5448,     0.00,      NaN
12/11/2018 15:45:07,     5448,     0.00,      NaN
12/11/2018 15:45:08,     5448,     0.00,      NaN
12/11/2018 15:45:08,     5448,     0.00,      NaN
12/11/2018 15:45:09,     5448,     0.00,      NaN
12/11/2018 15:45:09,     5448,     0.00,      NaN
""")
mock_csv.seek(0)

# disregard initial whitespace
df = pd.read_csv(mock_csv, skipinitialspace=True)
assert df['  B  '].hasnans

请参阅文档here

答案 2 :(得分:0)

我认为它显示为false,因为您列中的"NaN"值是"NaN"而不是np.nan,因此我猜想该列的数据类型可以是“对象”。因此,您必须将该"NaN"值转换为np.nan,以便该列的对象可以根据需要是int或float,而hasnans将返回正确的布尔值。

那么首先,

df[df["B"] == "NaN"] = np.nan #it will convert "NaN" values into np.nan

现在您可以使用hasnansisnull().any()

检查NaN值

干杯!