Question

我需要在2列上构建多个过滤器表的结构是7列，但第一个“查询”和最后一个“模板”正在过滤

我以前做过，但一直有效，但是现在（一年后）我不知道出了什么问题。

for item in glob.glob('D:\\path\\*.change'):
    table = pd.read_csv(item, sep='\t', index_col=None)
#FILTERING
    filtered_table = table[
        (table['query'].str.contains("egg*", regex=True)==False) &
        (table['query'].str.contains(".*phospho*", regex=True)==False) &
        (table['query'].str.contains("vipe", regex=True)==False) &
        (table['template'].str.contains("ABC1")) |
        (table['template'].str.contains("bender")) ]

预期结果是表中没有包含字符串的行-egg * 、. phospho ，'query'列中的vipe和'template'列中包含'ABC1'或'bender'的行。 / p>

Answer 1

我认为您的情况中有些东西缺少括号。

尝试一下：

table[(
       # AND condition
       table['query'].str.contains("egg*", regex=True)==False &
       table['query'].str.contains(".*phospho*", regex=True)==False &
       table['query'].str.contains("vipe", regex=True)==False &
       # OR condition
       (table['template'].str.contains("ABC1") |
        table['template'].str.contains("bender"))
      )]

Answer 2

我对问题的回答：

for item in glob.glob('D:\\path\\*.change'):
    table = pd.read_csv(item, sep='\t', index_col=None)
#FILTERING
    query_table = table[
        (table['query'].str.contains("egg*", regex=True)==False) &
        (table['query'].str.contains(".*phospho*", regex=True)==False) &
        (table['query'].str.contains("vipe", regex=True)==False)  ]

  filtered_table = query_table[
        (query_table['template'].str.contains("ABC1")) |
        (query_table['template'].str.contains("bender")) ]

熊猫多个过滤器字符串包含或不包含

2 个答案: