大熊猫过滤日期和OR条件

时间:2016-07-25 19:17:41

标签: python pandas

我正在使用pandas尝试计算在两个日期之间购买了特定类型合同的成员。我正在使用的数据框类似于:

Member Nbr       Contract-Type    Date-Joined 
20           1 Year Membership     2011-08-01   
3128        3 Month Membership     2011-07-22   
3535        4 Month Membership     2015-02-18  
3760        4 Month Membership     2010-02-28
3762        3 Month Membership     2010-01-31
3882        1 Month Membership     2010-04-24    
3892        3 Month Membership     2010-03-24     
4116        3 Month Membership     2014-12-02   
4700        1 Month Membership     2014-11-11   
4802        4 Month Membership     2014-07-26   
5004         1 Year Membership     2012-03-12
5020         1 Year Membership     2010-07-28    
5022        3 Month Membership     2010-06-25    
5130         1 Year Membership     2011-01-04
                      ...

如果只有一种合约类型,我有兴趣使用

,我可以得到计数
print(len(df[(df['Date-Joined'] > '2010-01-01') 
          & (df['Date-Joined'] < '2012-02-01')
          & (df['Member Type'] == '1 Year Membership')]))

当我通过使用以下代码指定1 Year Membership4 Month Membership来尝试类似内容时

print(len(df[(df['Date-Joined'] > '2013-01-01') 
      & (df['Date-Joined'] < '2013-02-01')
      & (df['Member Type'] == '1 Year Membership')
      or (df['Member Type'] == '4 Month Membership')]))

我收到以下错误

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

并将or条件替换为&条件会返回0

1 个答案:

答案 0 :(得分:5)

使用|代替or。此外,&优先于|,因此您的逻辑需要一组括号。

import io
import pandas as pd

data = io.StringIO('''\
Member Nbr,Contract-Type,Date-Joined
20,1 Year Membership,2011-08-01   
3128,3 Month Membership,2011-07-22
3535,4 Month Membership,2015-02-18  
3760,4 Month Membership,2010-02-28
3762,3 Month Membership,2010-01-31
3882,1 Month Membership,2010-04-24 
3892,3 Month Membership,2010-03-24
4116,3 Month Membership,2014-12-02
4700,1 Month Membership,2014-11-11
4802,4 Month Membership,2014-07-26
5004,1 Year Membership,2012-03-12
5020,1 Year Membership,2010-07-28 
5022,3 Month Membership,2010-06-25 
5130,1 Year Membership,2011-01-04
''')

df = pd.read_csv(data)

print(df[
   (df['Date-Joined'] > '2010-01-01') &
   (df['Date-Joined'] < '2012-02-01') &
   (df['Contract-Type'] == '1 Year Membership')
  ])

#     Member Nbr      Contract-Type    Date-Joined
# 0           20  1 Year Membership     2011-08-01   
# 11        5020  1 Year Membership     2010-07-28 
# 13        5130  1 Year Membership     2011-01-04

print(df[
   (df['Date-Joined'] > '2010-01-01') &
   (df['Date-Joined'] < '2012-02-01') &
   (df['Contract-Type'] == '1 Year Membership') |
   (df['Contract-Type'] == '4 Month Membership')
  ])

#     Member Nbr       Contract-Type    Date-Joined
# 0           20   1 Year Membership     2011-08-01   
# 2         3535  4 Month Membership     2015-02-18  <====== BEWARE!
# 3         3760  4 Month Membership     2010-02-28
# 9         4802  4 Month Membership     2014-07-26  <====== BEWARE!
# 11        5020   1 Year Membership     2010-07-28 
# 13        5130   1 Year Membership     2011-01-04

print(df[
   (df['Date-Joined'] > '2010-01-01') &
   (df['Date-Joined'] < '2012-02-01') &
   ((df['Contract-Type'] == '1 Year Membership') |
   (df['Contract-Type'] == '4 Month Membership'))
  ])

#     Member Nbr       Contract-Type    Date-Joined
# 0           20   1 Year Membership     2011-08-01   
# 3         3760  4 Month Membership     2010-02-28
# 11        5020   1 Year Membership     2010-07-28 
# 13        5130   1 Year Membership     2011-01-04