我有一个Pandas DataFrame如下:
In [28]: df = pd.DataFrame({'A':['CA', 'FO', 'CAP', 'CP'],
'B':['Name1', 'Name2', 'Name3', 'Name4'],
'C':['One', 'Two', 'Other', 'Some']})
In [29]: df
Out[29]:
A B C
0 CA Name1 One
1 FO Name2 Two
2 CAP Name3 Other
3 CP Name4 Some
我正在尝试计算A列中值'CA'
和'CP'
的所有记录,为此我执行下一步:
In [30]: len(df.groupby('A').filter(lambda x: x['A'] == 'CA'))
Out[30]: 1
有一种方法可以在一个句子中获取这两个信息吗?因为如果我尝试做这样的事情:
In [32]: len(df.groupby('A').filter(lambda x: x['A'] == 'CA' or
....: x['A'] == 'CP'))
我收到此错误:
ValueError Traceback (most recent call last)
<ipython-input-32-111c3fde30f2> in <module>()
----> 1 len(df.groupby('A').filter(lambda x: x['A'] == 'CA') or
2 x['A'] == 'CP')
c:\python27\lib\site-packages\pandas\core\generic.pyc in __nonzero__(self)
885 raise ValueError("The truth value of a {0} is ambiguous. "
886 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 887 .format(self.__class__.__name__))
888
889 __bool__ = __nonzero__
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
答案 0 :(得分:1)
我认为你不需要groupby,只需使用一个掩码和“或”运算符(在pandas中为|
):
In [3]: df
Out[3]:
A B C
0 CA Name1 One
1 FO Name2 Two
2 CAP Name3 Other
3 CP Name4 Some
In [4]: c = df[(df['A']=='CA') | (df['A']=='CP')]
In [5]: c
Out[5]:
A B C
0 CA Name1 One
3 CP Name4 Some
In [6]: len(c)
Out[6]: 2
答案 1 :(得分:1)
使用isin
并传递一个列表以在获取大小之前过滤df:
In [4]:
len(df[df['A'].isin(['CA','CP'])])
Out[4]:
2