Question

我只需要知道如何在我的文件行中搜索两个字符串。

示例：我需要该行包含“protein_coding”和“exon”。然后，如果确实包含它们，我将打印每行的某些列。我知道如何打印它们但无法弄清楚如何使用reg ex搜索两个字符串。先感谢您。

这是正确的吗？：如果是re.match（“protein_coding”＆amp;“exon”in the line：

Answer 1

这个正则表达式将匹配同时具有“protein_coding”和＆amp; “外显子”字符串。

^.*?\bprotein_coding\b.*?\bexon\b.*$

DEMO

>>> import re
>>> data = """protein_coding exon foo bar
... foo
... protein_coding
... """
>>> m = re.findall(r'^.*?\bprotein_coding\b.*?\bexon\b.*$', data, re.M)
>>> for i in m:
...     print i
... 
protein_coding exon foo bar

Answer 2

如果测试字符串不需要使用正则表达式，请记住您也可以使用Python的字符串函数和in：

>>> line='protein_coding other stuff exon more stuff'
>>> "protein_coding" in line and "exon" in line
True

或者，如果您想测试任意数量的单词，请使用all和目标单词元组进行测试：

>>> line='protein_coding other stuff exon more stuff'
>>> all(s in line for s in ("protein_coding", "exon", "words"))
False
>>> all(s in line for s in ("protein_coding", "exon", "stuff"))
True

如果匹配是需要正则表达式并且您希望限制为多个不相关的正则表达式，请使用all和理解来测试：

>>> p1=re.compile(r'\b[a-z]+_coding\b')
>>> p2=re.compile(r'\bexon\b')
>>> li=[p.search(line) for p in [p1, p2]]
>>> li
[<_sre.SRE_Match object at 0x10856d988>, <_sre.SRE_Match object at 0x10856d9f0>]
>>> all(e for e in li)
True

Answer 3

使用锚点和先行断言：

>>> re.findall(r'(?m)^(?=.*protein_coding)(?=.*exon).+$', data)

内联(?m)修饰符可启用多行模式。这里使用前瞻符合两个子串，无论它们的顺序如何。

Live Demo

在python中使用正则表达式来查找行中的两个字符串

3 个答案: