Question

我需要找到＆＃34; taxid＆＃34;的价值。在大量类似于下面给出的字符串中。对于这个特定的字符串，＆＃39; taxid＆＃39;价值是＆＃39; 9606＆＃39;我需要丢弃其他一切。＆＃34; taxid＆＃34;可能出现在文本的任何地方，但始终会跟随＆＃34;：＆＃34;然后编号。

score:0.86|taxid:9606(Human)|intact:EBI-999900

如何在python中为此编写正则表达式。

Answer 1

>>> import re
>>> s = 'score:0.86|taxid:9606(Human)|intact:EBI-999900'
>>> re.search(r'taxid:(\d+)', s).group(1)
'9606'

如果有多个出租车，请使用re.findall，它会返回所有匹配项的列表：

>>> re.findall(r'taxid:(\d+)', s)
['9606']

Answer 2

for line in lines:
    match = re.match(".*\|taxid:([^|]+)\|.*",line)
    print match.groups()