>>> df.head()
№ Summer Gold Silver Bronze Total № Winter \
Afghanistan (AFG) 13 0 0 2 2 0
Algeria (ALG) 12 5 2 8 15 3
Argentina (ARG) 23 18 24 28 70 18
Armenia (ARM) 5 1 2 9 12 6
Australasia (ANZ) [ANZ] 2 3 4 5 12 0
Gold.1 Silver.1 Bronze.1 Total.1 № Games Gold.2 \
Afghanistan (AFG) 0 0 0 0 13 0
Algeria (ALG) 0 0 0 0 15 5
Argentina (ARG) 0 0 0 0 41 18
Armenia (ARM) 0 0 0 0 11 1
Australasia (ANZ) [ANZ] 0 0 0 0 2 3
Silver.2 Bronze.2 Combined total
Afghanistan (AFG) 0 2 2
Algeria (ALG) 2 8 15
Argentina (ARG) 24 28 70
Armenia (ARM) 2 9 12
Australasia (ANZ) [ANZ] 4 5 12
不确定为什么我会看到此错误:
>>> df['Gold'] > 0 | df['Gold.1'] > 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/ankuragarwal/data_insight/env/lib/python2.7/site-packages/pandas/core/generic.py", line 917, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
这里有什么暧昧的?
但这有效:
>>> (df['Gold'] > 0) | (df['Gold.1'] > 0)
答案 0 :(得分:3)
假设我们有以下DF:
In [35]: df
Out[35]:
a b c
0 9 0 1
1 7 7 4
2 1 8 9
3 6 7 5
4 1 4 6
以下命令:
df.a > 5 | df.b > 5
因为|
具有更高的优先级(与>
相比),因为它在Operator precedence table中指定),它将被翻译为:
df.a > (5 | df.b) > 5
将翻译为:
df.a > (5 | df.b) and (5 | df.b) > 5
一步一步:
In [36]: x = (5 | df.b)
In [37]: x
Out[37]:
0 5
1 7
2 13
3 7
4 5
Name: c, dtype: int32
In [38]: df.a > x
Out[38]:
0 True
1 False
2 False
3 False
4 False
dtype: bool
In [39]: x > 5
Out[39]:
0 False
1 True
2 True
3 True
4 False
Name: b, dtype: bool
但最后一次操作won't work:
In [40]: (df.a > x) and (x > 5)
---------------------------------------------------------------------------
...
skipped
...
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
上面的错误消息可能会导致没有经验的用户做这样的事情:
In [12]: (df.a > 5).all() | (df.b > 5).all()
Out[12]: False
In [13]: df[(df.a > 5).all() | (df.b > 5).all()]
...
skipped
...
KeyError: False
但在这种情况下,您只需要明确设置优先级以获得预期结果:
In [10]: (df.a > 5) | (df.b > 5)
Out[10]:
0 True
1 True
2 True
3 True
4 False
dtype: bool
In [11]: df[(df.a > 5) | (df.b > 5)]
Out[11]:
a b c
0 9 0 1
1 7 7 4
2 1 8 9
3 6 7 5
答案 1 :(得分:0)
这是错误的真正原因:
http://pandas.pydata.org/pandas-docs/stable/gotchas.html
pandas遵循在尝试将某些内容转换为bool时引发错误的numpy约定。这种情况发生在if或者使用布尔运算时,或者,或者不是。目前尚不清楚的结果>>> if pd.Series([False, True, False]):
...
应该是。它应该是真的,因为它不是零长度吗?是的,因为有假值?目前还不清楚,所以相反,熊猫会引发一个ValueError:
>>> if pd.Series([False, True, False]):
print("I was true")
Traceback
...
ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().
如果你看到了,你需要明确选择你想用它做什么(例如,使用any(),all()或empty)。或者,您可能想要比较pandas对象是否为None