Question

我是熊猫的新用户。我无法理解为什么代码以这种方式工作。当元素实际上等于None时，为什么它返回True？

In [14]:
import pandas as pd
tweets = pd.DataFrame([None, None], columns=['country'])
print tweets['country'] != None

Out[14]:
0    True
1    True
Name: country, dtype: bool

谢谢。

Answer 1

简而言之，这是因为大熊猫认为None大致相当于NaN，而np.nan == np.nan是False。正如@economy和其他人所说，使用isnull()或notnull()方法做你想做的事。

现在，为什么这不是一个错误的一些理由。等于运算符由pandas.lib中的Cython代码定义。具体来说，在您编写pandas.lib.scalar_compare时会调用tweets['country'] == None。请注意scalar_compare的工作原理：

>>> pd.lib.scalar_compare(np.array([None]), None, operator.ne)
array([ True], dtype=bool)

这就是你所看到的行为。现在，这不太可能是一个错误，因为如果我们查看code for scalar_compare，它会指向_checknull function明确处理None的{{3}}。如果我们查看该代码，我们会发现它基本上（并且非常刻意地）说None == None是False。

Answer 2

我不确定为什么表达式返回true，但你可以使用内置null检查器的pandas来确定值是否为null：

print tweets.notnull()

country
0   False
1   False

对应的是

print tweets.isnull()

country
0   True
1   True

熊猫没有逻辑索引混淆

2 个答案: