Question

我有python的基本知识（完成一个类），我不确定如何处理下一个脚本。我有两个文件，一个是newick树 - 看起来像这样，但更大：

(((1:0.01671793,2:0.01627631):0.00455274,(3:0.02781576,4:0.05606947):0.02619237):0.08529440,5:0.16755623);

第二个文件是制表符分隔的文本文件，看起来像这样但是要大得多：

1 \t Human
2 \t Chimp
3 \t Mouse
4 \t Rat
5 \t Fish

我想用newick文件中的序列ID号（只有冒号后面的那些）替换文本文件中的物种名称来创建

(((Human:0.01671793,Chimp:0.01627631):0.00455274,(Mouse:0.02781576,Rat:0.05606947):0.02619237):0.08529440,Fish:0.16755623);

我的伪代码（打开两个文件后）看起来像

for line in txtfile:
    if line[0] matches \(\d*\ in newick:
        replace that \d* with line[2]

任何建议都将不胜感激！

Answer 1

这可以通过定义在regexp \(\d*:的每个匹配项上运行的回调函数来完成。

这里是https://docs.python.org/2/library/re.html#text-munging的一个（不相关的）示例，它说明了回调函数如何与执行正则表达式替换的re.sub()一起使用：

>>> def repl(m):
...   inner_word = list(m.group(2))
...   random.shuffle(inner_word)
...   return m.group(1) + "".join(inner_word) + m.group(3)
>>> text = "Professor Abdolmalek, please report your absences promptly."
>>> re.sub(r"(\w)(\w+)(\w)", repl, text)
'Poefsrosr Aealmlobdk, pslaee reorpt your abnseces plmrptoy.'
>>> re.sub(r"(\w)(\w+)(\w)", repl, text)
'Pofsroser Aodlambelk, plasee reoprt yuor asnebces potlmrpy.'

Answer 2

您也可以使用findall：

执行此操作

import re

s = "(((1:0.01671793,2:0.01627631):0.00455274,(3:0.02781576,4:0.05606947):0.02619237):0.08529440,5:0.16755623)"

rep = {'1':'Human',
'2':'Chimp',
'3':'Mouse',
'4':'Rat',
'5':'Fish'}

for i in re.findall(r'(\d+:)', s):
    s = s.replace(i, rep[i[:-1]]+':')

>>> print s
(((Human:0.01671793,Chimp:0.01627631):0.00455274,(Mouse:0.02781576,Rat:0.05606947):0.02619237):0.08529440,Fish:0.16755623)

迭代Python中的正则表达式

2 个答案: