Question

我试图通过以下方式获取数据框是否包含Null或NaN值：

import numpy as np
import pandas as pd

# load the time series of known points
y = [np.array([])]

y = [np.array([10, 11, 11, 11, 2, 4, 3, 7, 8, 9]),  
     np.array([1, 1, 1, 2, 2, 2, 2, 3, 2, 1]), 
     np.array([1, 1, 2, 2, 2, 2, 3, 2, 1])]
y = pd.DataFrame(y)



for i in range(len(y)):
    if (pd.isnull(y.any) == True):
        print ("Error: t2 array index " + str(i) + " ahave NaN or null!")
print("All good bro")
print (pd.isnull(y)
print (pd.isnull(y.any))

当我打印y时，您可以清楚地看到最后一个元素是NaN。这是由于第三个numpy数组比其他数组短。熊猫会自动用NaN填充缺失值，以保持数据框形状。

但是，如果我尝试打印pd.isnull(y.any)，我会得到False。我改用y.all并得到False。我也尝试过y[1].any，但结果相同。

我在这里想念什么？

Answer 1

您可以使用：

y.isnull().values.any()

代码：

import numpy as np
import pandas as pd

# load the time series of known points
y = [np.array([])]

y = [np.array([10, 11, 11, 11, 2, 4, 3, 7, 8, 9]),  
     np.array([1, 1, 1, 2, 2, 2, 2, 3, 2, 1]), 
     np.array([1, 1, 2, 2, 2, 2, 3, 2, 1])]
y = pd.DataFrame(y)

print(y.isnull().values.any())

输出：

True

尽管DataFrame中存在NaN，但isull（df.any）返回False

1 个答案: