Question

我有以下从csv文件中读取的数据框：

   gene name  annotation  ng DNA
 0  HRAS       G12S        3.00
 1  PIK3CA     R88L        3.00
 2  BRAF       E474A       3.00
 3  EGFR       E734Q       3.00
 4  EGFR       V769        3.00
 5  BRAF       LQ599PE     4.00
 6  BRAF       KT587NA     4.00
 7  HRAS       G12S        17.70

我想根据2列中的多个条件进行过滤：例如根据'BRAF'+'E474A'和'HRAS'+'G12S'进行过滤，因此将创建以下df：

   gene name  annotation  ng DNA
 0  HRAS       G12S        3.00
 2  BRAF       E474A       3.00
 7  HRAS       G12S        17.70

有关优雅解决方案的任何想法吗？

Answer 1

使用boolean indexing并通过np.logical_or.reduce将所有面具加入一个：

m1 = (df['gene name'] == 'BRAF') & (df['annotation'] == 'E474A')
m2 = (df['gene name'] == 'HRAS') & (df['annotation'] == 'G12S')

df = df[np.logical_or.reduce([m1, m2])]
print (df)
  gene name annotation  ng DNA
0      HRAS       G12S     3.0
2      BRAF      E474A     3.0
7      HRAS       G12S    17.7

包含list comprehension中过滤器值的元组列表的更动态解决方案：

tup = [('BRAF','E474A'), ('HRAS', 'G12S')]
df = df[np.logical_or.reduce([(df['gene name']== a)&(df['annotation']== b) for a, b in tup])]
print (df)
  gene name annotation  ng DNA
0      HRAS       G12S     3.0
2      BRAF      E474A     3.0
7      HRAS       G12S    17.7

根据熊猫中的特定多种条件筛选出来

1 个答案: