Pandas过滤多个列,具有单一标准

时间:2016-10-25 13:03:22

标签: python excel pandas dataframe

我有一张超过一百列的excel表。我需要过滤其中的五个以查看哪个列在其中一个单元格中有“否”。有没有办法使用单个搜索条件筛选多个列,例如:

 no_invoice_filter = df[(df['M1: PL - INVOICED']) & (df['M2: EX - INVOICED']) & (df['M3: TEST DEP - INVOICED']) == 'No']

反对单独写出如果每列等于“否”

上面代码的

错误:

TypeError: unsupported operand type(s) for &: 'str' and 'bool'

2 个答案:

答案 0 :(得分:1)

您需要在列中使用any列的子集至少一个No

df[(df[['M1: PL - INVOICED','M2: EX - INVOICED','M3: TEST DEP - INVOICED']] == 'No')
      .any(axis=1)]

样品:

df = pd.DataFrame({'M1: PL - INVOICED':['a','Yes','No'],
                   'M2: EX - INVOICED':['Yes','No','b'],
                   'M3: TEST DEP - INVOICED':['s','a','No']})

print (df)
  M1: PL - INVOICED M2: EX - INVOICED M3: TEST DEP - INVOICED
0                 a               Yes                       s
1               Yes                No                       a
2                No                 b                      No

print ((df[['M1: PL - INVOICED','M2: EX - INVOICED','M3: TEST DEP - INVOICED']] == 'No'))
  M1: PL - INVOICED M2: EX - INVOICED M3: TEST DEP - INVOICED
0             False             False                   False
1             False              True                   False
2              True             False                    True

print ((df[['M1: PL - INVOICED','M2: EX - INVOICED','M3: TEST DEP - INVOICED']] == 'No')
          .any(axis=1))
0    False
1     True
2     True
dtype: bool


print (df[(df[['M1: PL - INVOICED','M2: EX - INVOICED','M3: TEST DEP - INVOICED']] == 'No')
           .any(1)])

  M1: PL - INVOICED M2: EX - INVOICED M3: TEST DEP - INVOICED
1               Yes                No                       a
2                No                 b                      No

答案 1 :(得分:1)

你可以这样做:

df[(df[['M1: PL - INVOICED','M2: EX - INVOICED','M3: TEST DEP - INVOICED']] == 'No')]

因此,您基本上会传递一系列感兴趣的列表,并将这些列与您的标量值进行比较,如果您在“否”出现在任何地方,请使用any(axis=1)

In [115]:
df = pd.DataFrame({'a':'no', 'b':'yes', 'c':['yes','no','yes','no','no']})
df

Out[115]:
    a    b    c
0  no  yes  yes
1  no  yes   no
2  no  yes  yes
3  no  yes   no
4  no  yes   no

使用any(axis=1)然后返回所有感兴趣的col中出现No的行:

In [133]:    
df[(df[['a','c']] == 'no').any(axis=1)]

Out[133]:
    a    b    c
0  no  yes  yes
1  no  yes   no
2  no  yes  yes
3  no  yes   no
4  no  yes   no

您还可以使用掩码使用dropna

删除特定列的NaN行
In [132]:    
df[df[['a','c']] == 'no'].dropna(subset=['c'])

Out[132]:
    a    b   c
1  no  NaN  no
3  no  NaN  no
4  no  NaN  no