Question

我在使用if语句评估字典中的值时遇到了问题。

给定以下字典，我从数据框导入（如果重要）：

>>> pnl[company]
29:   Active Credit       Date   Debit Strike Type
0      1      0 2013-01-08  2.3265  21.15  Put
1      0      0 2012-11-26      40     80  Put
2      0      0 2012-11-26     400     80  Put

我尝试评估以下语句以确定Active的最后一个值的值：

if pnl[company].tail(1)['Active']==1:
    print 'yay'

但是，我遇到以下错误消息：

Traceback (most recent call last):
  File "<pyshell#69>", line 1, in <module>
    if pnl[company].tail(1)['Active']==1:
  File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 676, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

这令我感到惊讶，因为我可以使用上面的命令显示我想要的值而没有if语句：

>>> pnl[company].tail(1)['Active']
30: 2    0
Name: Active, dtype: object

鉴于该值明显为零而且索引为2，我尝试了以下内容进行简短的健全性检查，发现事情并没有像我预期的那样发生：

>>> if pnl[company]['Active'][2]==0:
...     print 'woo-hoo'
... else:
...     print 'doh'


doh

我的问题是：

1）这里可能会发生什么？我怀疑我在一些基本层面上误解了词典。

2）我注意到，当我提出这个字典的任何给定值时，左边的数字增加1.这代表什么？例如：

>>> pnl[company].tail(1)['Active']
31: 2    0
Name: Active, dtype: object
>>> pnl[company].tail(1)['Active']
32: 2    0
Name: Active, dtype: object
>>> pnl[company].tail(1)['Active']
33: 2    0
Name: Active, dtype: object
>>> pnl[company].tail(1)['Active']
34: 2    0
Name: Active, dtype: object

提前感谢您的帮助。

Answer 1

您所产生的是Pandas Series对象，虽然它只是您需要将行更改为的单个值，但无法以您尝试的方式进行评估：

if pnl[company].tail(1)['Active'].any()==1:
  print 'yay'

关于你的第二个问题，请参阅我的评论。

修改

从评论和链接到您的输出，调用any()修复了错误消息，但您的数据实际上是字符串，因此比较仍然失败，您可以这样做：

if pnl[company].tail(1)['Active'].any()=='1': print 'yay'

进行字符串比较，或者修复数据但是已经读取或生成了数据。

或者执行：

pnl['Company']['Active'] = pnl['Company']['Active'].astype(int)

要转换列的dtype，以便您的比较更正确。

Answer 2

A Series是NDFrame的子类。 NDFrame.__bool__方法always raises a ValueError。因此，尝试在布尔上下文中评估Series会引发ValueError - 即使Series只有一个值。

NDFrames没有布尔值（错误，即始终引发ValueError）的原因是因为有一个可能的标准，人们可能合理地期望NDFrame为True。这可能意味着

NDFrame中的每个项目均为True，或者（如果是，请使用.all()）
NDFrame中的任何项目均为True，或者（如果是，请使用Series.any()）
NDFrame不为空（如果是，请使用.empty()）

由于两者都有可能，并且由于不同的用户有不同的期望，而不仅仅是选择一个，开发人员拒绝猜测，而是要求NDFrame的用户明确他们希望使用的标准。

错误消息列出了最可能的选择：

使用a.empty，a.bool（），a.item（），a.any（）或a.all（）

因为在您的情况下，您知道系列只包含一个值，您可以使用item：

if pnl[company].tail(1)['Active'].item() == 1:
    print 'yay'

关于你的第二个问题：左边的数字似乎是你的Python解释器（PyShell？）产生的行编号 - 但这只是我的猜测。

警告：据推测，

if pnl[company].tail(1)['Active']==1:

表示当系列中的单个值等于1时，您希望条件为True。代码

if pnl[company].tail(1)['Active'].any()==1:
    print 'yay'

任何数字

将为True。例如，如果我们将pnl[company].tail(1)['Active']设为等于< / p>

In [128]: s = pd.Series([2], index=[2])

然后

In [129]: s.any()
Out[129]: True

因此，

In [130]: s.any()==1
Out[130]: True

我认为s.item() == 1更忠实地保留了你的意图：

In [132]: s.item()==1
Out[132]: False

(s == 1).any()也可以使用，但使用any并不能很明确地表达您的意图，因为您知道系列只包含一个值。

Answer 3

您的问题与Python词典或原生Python完全没有任何关系。这是关于熊猫系列，其他答案给你正确的语法：

从更广泛的意义上解释您的问题，它是关于pandas Series如何NumPy和NumPy historically until recently had notoriously poor support for logical values and operators的问题。使用NumPy提供的功能，熊猫可以做到最好。有时必须手动调用numpy逻辑函数而不是仅使用任意（Python）运算符编写代码是烦人且笨重的，有时会膨胀pandas代码。此外，您经常需要这样做才能获得性能（numpy比来自本机Python的thunking更好）。但那是我们支付的价格。

有许多限制，怪癖和陷阱（下面的例子） - 最好的建议是由于numpy的限制而不信任布尔作为熊猫的一等公民：

pandas Caveats and Gotchas - Using If/Truth Statements with Pandas
表现示例：Python ~ can be used instead of np.invert() - more legible but 3x slower or worse
一些问题和局限：在下面的代码中，请注意最近的numpy现在允许布尔值（内部表示为int）并允许NAs，但是value_counts()忽略了NAs（与R's table, which has option 'useNA'相比）。

import numpy as np
import pandas as pd
s = pd.Series([True, True, False, True, np.NaN])
s2  = pd.Series([True, True, False, True, np.NaN])
dir(s) # look at .all, .any, .bool, .eq, .equals, .invert, .isnull, .value_counts() ...

s.astype(bool) # WRONG: should use the member s.bool ; no parentheses, it's a member, not a function
# 0     True
# 1     True
# 2    False
# 3     True
# 4     True  # <--- should be NA!!
#dtype: bool

s.bool
# <bound method Series.bool of
# 0     True
# 1     True
# 2    False
# 3     True
# 4      NaN
# dtype: object>

# Limitation: value_counts() currently excludes NAs
s.value_counts()
# True     3
# False    1
# dtype: int64
help(s.value_counts) # "... Excludes NA values(!)"

# Equality comparison - vector - fails on NAs, again there's no NA-handling option):
s == s2 # or equivalently, s.eq(s2)
# 0     True
# 1     True
# 2     True
# 3     True
# 4    False  # BUG/LIMITATION: we should be able to choose NA==NA
# dtype: bool

# ...but the scalar equality comparison says they are equal!!
s.equals(s2)
# True

使用逻辑表达式和if语句评估pandas系列值

3 个答案: