我有以下从csv文件中读取的数据框:
gene name annotation ng DNA
0 HRAS G12S 3.00
1 PIK3CA R88L 3.00
2 BRAF E474A 3.00
3 EGFR E734Q 3.00
4 EGFR V769 3.00
5 BRAF LQ599PE 4.00
6 BRAF KT587NA 4.00
7 HRAS G12S 17.70
我想根据2列中的多个条件进行过滤: 例如根据'BRAF'+'E474A'和'HRAS'+'G12S'进行过滤,因此将创建以下df:
gene name annotation ng DNA
0 HRAS G12S 3.00
2 BRAF E474A 3.00
7 HRAS G12S 17.70
有关优雅解决方案的任何想法吗?
答案 0 :(得分:1)
使用boolean indexing
并通过np.logical_or.reduce
将所有面具加入一个:
m1 = (df['gene name'] == 'BRAF') & (df['annotation'] == 'E474A')
m2 = (df['gene name'] == 'HRAS') & (df['annotation'] == 'G12S')
df = df[np.logical_or.reduce([m1, m2])]
print (df)
gene name annotation ng DNA
0 HRAS G12S 3.0
2 BRAF E474A 3.0
7 HRAS G12S 17.7
包含list comprehension
中过滤器值的元组列表的更动态解决方案:
tup = [('BRAF','E474A'), ('HRAS', 'G12S')]
df = df[np.logical_or.reduce([(df['gene name']== a)&(df['annotation']== b) for a, b in tup])]
print (df)
gene name annotation ng DNA
0 HRAS G12S 3.0
2 BRAF E474A 3.0
7 HRAS G12S 17.7