在python pandas dataframe groupby上应用两个过滤条件

时间:2016-05-03 09:18:51

标签: python pandas filter dataframe

我有一个Pandas DataFrame如下:

In [28]: df = pd.DataFrame({'A':['CA', 'FO', 'CAP', 'CP'],
                            'B':['Name1', 'Name2', 'Name3', 'Name4'],
                            'C':['One', 'Two', 'Other', 'Some']})

In [29]: df
Out[29]:
    A      B      C
0   CA  Name1    One
1   FO  Name2    Two
2  CAP  Name3  Other
3   CP  Name4   Some

我正在尝试计算A列中值'CA''CP'的所有记录,为此我执行下一步:

In [30]: len(df.groupby('A').filter(lambda x: x['A'] == 'CA'))
Out[30]: 1

有一种方法可以在一个句子中获取这两个信息吗?因为如果我尝试做这样的事情:

In [32]: len(df.groupby('A').filter(lambda x: x['A'] == 'CA' or
   ....:                                      x['A'] == 'CP'))

我收到此错误:

ValueError                                Traceback (most recent call last)
<ipython-input-32-111c3fde30f2> in <module>()
----> 1 len(df.groupby('A').filter(lambda x: x['A'] == 'CA') or
      2                                      x['A'] == 'CP')

c:\python27\lib\site-packages\pandas\core\generic.pyc in __nonzero__(self)
    885         raise ValueError("The truth value of a {0} is ambiguous. "
    886                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 887                          .format(self.__class__.__name__))
    888
    889     __bool__ = __nonzero__

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

2 个答案:

答案 0 :(得分:1)

我认为你不需要groupby,只需使用一个掩码和“或”运算符(在pandas中为|):

In [3]: df
Out[3]: 
     A      B      C
0   CA  Name1    One
1   FO  Name2    Two
2  CAP  Name3  Other
3   CP  Name4   Some

In [4]: c = df[(df['A']=='CA') | (df['A']=='CP')]

In [5]: c
Out[5]: 
    A      B     C
0  CA  Name1   One
3  CP  Name4  Some

In [6]: len(c)
Out[6]: 2

答案 1 :(得分:1)

使用isin并传递一个列表以在获取大小之前过滤df:

In [4]:
len(df[df['A'].isin(['CA','CP'])])

Out[4]:
2