TLDR； _{熊猫中的逻辑运算符为&，|和~，括号(...)很重要！}

Question

我在Pandas中使用布尔索引。问题是为什么声明：

a[(a['some_column']==some_number) & (a['some_other_column']==some_other_number)]

工作正常，而

a[(a['some_column']==some_number) and (a['some_other_column']==some_other_number)]

退出错误？

示例：

a=pd.DataFrame({'x':[1,1],'y':[10,20]})

In: a[(a['x']==1)&(a['y']==10)]
Out:    x   y
     0  1  10

In: a[(a['x']==1) and (a['y']==10)]
Out: ValueError: The truth value of an array with more than one element is ambiguous.     Use a.any() or a.all()

Answer 1

当你说

时

(a['x']==1) and (a['y']==10)

您隐式要求Python将(a['x']==1)和(a['y']==10)转换为布尔值。

NumPy数组（长度大于1）和Pandas对象（如Series）没有布尔值 - 换句话说，它们会引发

ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().

用作布尔值。那是因为它unclear when it should be True or False。如果某些用户长度非零，则可能会认为它们是True，如Python列表。其他人可能希望只有当所有其元素为True时才成为True。如果任何的元素为True，其他人可能希望它为True。

因为有太多相互矛盾的期望，NumPy和Pandas的设计师拒绝猜测，而是提出了一个ValueError。

相反，您必须明确，通过调用empty()，all()或any()方法来指明您想要的行为。

但是，在这种情况下，看起来你不想要布尔评估，你想要元素逻辑 - 和。这就是&二元运算符执行的内容：

(a['x']==1) & (a['y']==10)

返回一个布尔数组。

顺便说一下，alexpmil notes，括号是强制性的，因为&的{{3}}高于==。如果没有括号，a['x']==1 & a['y']==10将被评估为a['x'] == (1 & a['y']) == 10，而(a['x'] == (1 & a['y'])) and ((1 & a['y']) == 10)将等同于链式比较Series and Series。这是and形式的表达。 ValueError与两个系列的使用将再次触发与上述相同的{{1}}。这就是为什么括号是强制性的。

Answer 2

TLDR； _{熊猫中的逻辑运算符为&，|和~，括号(...)很重要！}

Python的and，or和not逻辑运算符旨在与标量一起使用。因此，Pandas必须做得更好，并覆盖按位运算符，以实现此功能的 vectorized （逐元素）版本。

因此python中的以下内容（exp1和exp2是计算为布尔结果的表达式）...

exp1 and exp2              # Logical AND
exp1 or exp2               # Logical OR
not exp1                   # Logical NOT

...将翻译为...

exp1 & exp2                # Element-wise logical AND
exp1 | exp2                # Element-wise logical OR
~exp1                      # Element-wise logical NOT

熊猫。

如果在执行逻辑运算的过程中得到ValueError，则需要使用括号进行分组：

(exp1) op (exp2)

例如，

(df['col1'] == x) & (df['col2'] == y)

以此类推。

Boolean Indexing：常见的操作是通过逻辑条件来计算布尔掩码，以过滤数据。 Pandas提供三个运算符：&表示逻辑AND，|表示逻辑OR，~表示逻辑NOT。

请考虑以下设置：

np.random.seed(0)
df = pd.DataFrame(np.random.choice(10, (5, 3)), columns=list('ABC'))
df

   A  B  C
0  5  0  3
1  3  7  9
2  3  5  2
3  4  7  6
4  8  8  1

逻辑与

对于上面的df，说您想返回A <5和B> 5的所有行。这是通过分别计算每个条件的掩码并对其进行“与”运算来实现的。

按位&运算符重载
在继续之前，请注意文档的此特定摘录，其中注明

另一种常见的操作是使用布尔向量来过滤数据。运算符为：|的{{1}}，or的{{1}}和&的{{1}}。 这些必须使用括号进行分组，因为默认情况下，Python将将所需的评估顺序为and，将~这样的表达式评估为not。

因此，考虑到这一点，可以使用按位运算符df.A > 2 & df.B < 3来实现元素级逻辑AND：

df.A > (2 &
  df.B) < 3

(df.A > 2) & (df.B <
  3)

随后的过滤步骤很简单，

括号用于覆盖按位运算符的默认优先级顺序，其优先级高于条件运算符df['A'] < 5 0 False 1 True 2 True 3 True 4 False Name: A, dtype: bool df['B'] > 5 0 False 1 True 2 False 3 True 4 True Name: B, dtype: bool和(df['A'] < 5) & (df['B'] > 5) 0 False 1 True 2 False 3 True 4 False dtype: bool。请参阅python文档中的Operator Precedence部分。

如果不使用括号，则表达式的计算不正确。例如，如果您不小心尝试了

df[(df['A'] < 5) & (df['B'] > 5)]

   A  B  C
1  3  7  9
3  4  7  6

它被解析为

成为

成为哪个（请参见chained operator comparison上的python文档）

df['A'] < 5 & df['B'] > 5

成为

df['A'] < (5 & df['B']) > 5

哪个抛出

df['A'] < something_you_dont_want > 5

所以，请不要犯那个错误！¹

避免括号分组
解决方法实际上非常简单。大多数运算符都有对应的DataFrame绑定方法。如果使用函数而不是条件运算符来构建单个掩码，则不再需要按括号分组以指定评估顺序：

(df['A'] < something_you_dont_want) and (something_you_dont_want > 5)

# Both operands are Series...
something_else_you_dont_want1 and something_else_you_dont_want2

请参阅Flexible Comparisons.上的部分。总结一下，我们有

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

避免括号的另一种方法是使用DataFrame.query（或df['A'].lt(5) 0 True 1 True 2 True 3 True 4 False Name: A, dtype: bool df['B'].gt(5) 0 False 1 True 2 False 3 True 4 True Name: B, dtype: bool）：

df['A'].lt(5) & df['B'].gt(5)

0    False
1     True
2    False
3     True
4    False
dtype: bool

我在Dynamic Expression Evaluation in pandas using pd.eval()中广泛记录了╒════╤════════════╤════════════╕ │ │ Operator │ Function │ ╞════╪════════════╪════════════╡ │ 0 │ > │ gt │ ├────┼────────────┼────────────┤ │ 1 │ >= │ ge │ ├────┼────────────┼────────────┤ │ 2 │ < │ lt │ ├────┼────────────┼────────────┤ │ 3 │ <= │ le │ ├────┼────────────┼────────────┤ │ 4 │ == │ eq │ ├────┼────────────┼────────────┤ │ 5 │ != │ ne │ ╘════╧════════════╧════════════╛和eval。

operator.and_
允许您以功能方式执行此操作。内部调用df.query('A < 5 and B > 5') A B C 1 3 7 9 3 4 7 6，它对应于按位运算符。

query

您通常不需要这个，但是了解它很有用。

概括：np.logical_and（和eval）
另一种选择是使用Series.__and__，它也不需要括号分组：

import operator 

operator.and_(df['A'] < 5, df['B'] > 5)
# Same as,
# (df['A'] < 5).__and__(df['B'] > 5) 

0    False
1     True
2    False
3     True
4    False
dtype: bool

df[operator.and_(df['A'] < 5, df['B'] > 5)]

   A  B  C
1  3  7  9
3  4  7  6

logical_and.reduce是ufunc (Universal Functions)，大多数ufunc具有reduce方法。这意味着如果对AND有多个掩码，则更容易使用np.logical_and进行概括。例如，要对np.logical_and(df['A'] < 5, df['B'] > 5) 0 False 1 True 2 False 3 True 4 False Name: A, dtype: bool df[np.logical_and(df['A'] < 5, df['B'] > 5)] A B C 1 3 7 9 3 4 7 6和np.logical_and和logical_and和m1进行AND屏蔽，则必须

m2

但是，更简单的选择是

m3

这很强大，因为它使您可以使用更复杂的逻辑在此基础上构建（例如，在列表理解中动态生成掩码并添加所有掩码）：

_{1-我知道我在这一点上很难过，但是请忍受我。这是一个非常，非常初学者的常见错误，必须非常彻底地加以解释。}

逻辑或

对于上面的m1 & m2 & m3，说您想返回A == 3或B == 7的所有行。

按位np.logical_and.reduce([m1, m2, m3])

重载

import operator

cols = ['A', 'B']
ops = [np.less, np.greater]
values = [5, 5]

m = np.logical_and.reduce([op(df[c], v) for op, c, v in zip(ops, cols, values)])
m 
# array([False,  True, False,  True, False])

df[m]
   A  B  C
1  3  7  9
3  4  7  6

df

如果您还没有阅读过，请另外阅读上面逻辑与的部分，此处所有警告均适用。

或者，可以使用

指定此操作

operator.or_
致电df['A'] == 3 0 False 1 True 2 True 3 False 4 False Name: A, dtype: bool df['B'] == 7 0 False 1 True 2 False 3 True 4 False Name: B, dtype: bool。

(df['A'] == 3) | (df['B'] == 7)

0    False
1     True
2     True
3     True
4    False
dtype: bool

df[(df['A'] == 3) | (df['B'] == 7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

np.logical_or
对于两个条件，请使用df[df['A'].eq(3) | df['B'].eq(7)] A B C 1 3 7 9 2 3 5 2 3 4 7 6：

Series.__or__

对于多个蒙版，请使用operator.or_(df['A'] == 3, df['B'] == 7) # Same as, # (df['A'] == 3).__or__(df['B'] == 7) 0 False 1 True 2 True 3 True 4 False dtype: bool df[operator.or_(df['A'] == 3, df['B'] == 7)] A B C 1 3 7 9 2 3 5 2 3 4 7 6：

logical_or

逻辑非

给出一个掩码，例如

np.logical_or(df['A'] == 3, df['B'] == 7)

0    False
1     True
2     True
3     True
4    False
Name: A, dtype: bool

df[np.logical_or(df['A'] == 3, df['B'] == 7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

如果需要反转每个布尔值（以使最终结果为logical_or.reduce），则可以使用下面的任何方法。

按位np.logical_or.reduce([df['A'] == 3, df['B'] == 7]) # array([False, True, True, True, False]) df[np.logical_or.reduce([df['A'] == 3, df['B'] == 7])] A B C 1 3 7 9 2 3 5 2 3 4 7 6

mask = pd.Series([True, True, False])

同样，表达式需要加上括号。

[False, False, True]

此内部调用

但是不要直接使用它。

~mask 0 False 1 False 2 True dtype: bool
在系列上内部调用~(df['A'] == 3) 0 True 1 False 2 False 3 True 4 True Name: A, dtype: bool。

mask.__invert__()

0    False
1    False
2     True
dtype: bool

np.logical_not
这是numpy的变体。

operator.inv

请注意，__invert__可以代替operator.inv(mask) 0 False 1 False 2 True dtype: bool，np.logical_not(mask) 0 False 1 False 2 True dtype: bool用np.logical_and代替，np.bitwise_and用logical_or。

Answer 3

用于在熊猫中进行布尔索引的逻辑运算符

认识到不能在and上使用任何Python 逻辑运算符（or，not或pandas.Series）或pandas.DataFrame s（类似地，您不能在具有多个元素的numpy.array s上使用它们）。之所以不能使用它们，是因为它们隐式调用其操作数上的bool并引发异常，因为这些数据结构确定数组的布尔值是不明确的：

>>> import numpy as np
>>> import pandas as pd
>>> arr = np.array([1,2,3])
>>> s = pd.Series([1,2,3])
>>> df = pd.DataFrame([1,2,3])
>>> bool(arr)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
>>> bool(s)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> bool(df)
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我确实更详细地介绍了 in my answer to the "Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()" Q+A。

NumPys逻辑功能

然而，NumPy为这些运算符提供了元素方式的等效操作，作为可以在numpy.array，pandas.Series，pandas.DataFrame或任何其他（符合标准）{ {1}}子类：

numpy.array有np.logical_and
and有np.logical_or
or有np.logical_not
numpy.logical_xor，没有Python等效项，但是是逻辑"exclusive or"操作

因此，从本质上讲，应该使用（假设not和df1是熊猫DataFrames）：

df2

布尔的按位函数和按位运算符

但是，如果您具有布尔NumPy数组，pandas系列或pandas DataFrame，则也可以使用element-wise bitwise functions（对于布尔，它们与逻辑函数是（或至少应该是）不可区分的）：

按位和：np.bitwise_and或np.logical_and(df1, df2) np.logical_or(df1, df2) np.logical_not(df1) np.logical_xor(df1, df2)运算符
按位或：np.bitwise_or或&运算符
按位不可以：np.invert（或别名|）或np.bitwise_not运算符
按位异或：np.bitwise_xor或~运算符

通常使用运算符。但是，当与比较运算符结合使用时，必须记住将比较用括号括起来，因为按位运算符具有higher precedence than the comparison operators：

这可能很烦人，因为Python逻辑运算符的优先级低于比较运算符，因此您通常编写(df1 < 10) | (df2 > 10) # instead of the wrong df1 < 10 | df2 > 10（其中a < 10 and b > 10和a是简单的整数）并且不要不需要括号。

逻辑和按位运算之间的差异（非布尔值）

必须特别强调的是，位和逻辑运算仅等效于布尔NumPy数组（以及布尔Series和DataFrame）。如果这些不包含布尔值，则这些操作将给出不同的结果。我将提供使用NumPy数组的示例，但对于熊猫数据结构，结果将是相似的：

由于NumPy（和类似的熊猫）对布尔（Boolean or “mask” index arrays）和整数（Index arrays）索引执行不同的操作，因此索引的结果也将不同：

>>> import numpy as np
>>> a1 = np.array([0, 0, 1, 1])
>>> a2 = np.array([0, 1, 0, 1])

>>> np.logical_and(a1, a2)
array([False, False, False,  True])
>>> np.bitwise_and(a1, a2)
array([0, 0, 0, 1], dtype=int32)

摘要表

>>> a3 = np.array([1, 2, 3, 4])

>>> a3[np.logical_and(a1, a2)]
array([4])
>>> a3[np.bitwise_and(a1, a2)]
array([1, 1, 1, 2])

逻辑运算符不适用于NumPy数组，pandas系列和pandas DataFrame。其他的则在这些数据结构（和普通的Python对象）上工作，并逐个元素地工作。但是，请注意在纯Python Logical operator | NumPy logical function | NumPy bitwise function | Bitwise operator ------------------------------------------------------------------------------------- and | np.logical_and | np.bitwise_and | & ------------------------------------------------------------------------------------- or | np.logical_or | np.bitwise_or | | ------------------------------------------------------------------------------------- | np.logical_xor | np.bitwise_xor | ^ ------------------------------------------------------------------------------------- not | np.logical_not | np.invert | ~上按位取反，因为在这种情况下布尔将被解释为整数（例如bool返回~False，而-1返回{{ 1}}）。

Pandas中布尔索引的逻辑运算符

3 个答案:

TLDR； _{熊猫中的逻辑运算符为&，|和~，括号(...)很重要！}

逻辑与

逻辑或

逻辑非

NumPys逻辑功能

布尔的按位函数和按位运算符

逻辑和按位运算之间的差异（非布尔值）

摘要表

Pandas中布尔索引的逻辑运算符

3 个答案:

TLDR； 熊猫中的逻辑运算符为&，|和~，括号(...)很重要！

逻辑与

逻辑或

逻辑非

NumPys逻辑功能

布尔的按位函数和按位运算符

逻辑和按位运算之间的差异（非布尔值）

摘要表

TLDR； _{熊猫中的逻辑运算符为&，|和~，括号(...)很重要！}