String = n76a+q80a+l83a+i153a+l203f+r207a+s211a+s215w+f216a+e283l
我希望脚本一次看一对,意思是:
评估n76a + q80a。如果abs(76-80)< 10,然后用'_'替换'+': 否则不要改变任何东西。 然后再评估q80a + l83a并做同样的事情。
所需的输出应为:
n76a_q80a_l83a+i153a+l203f_r207a_s211a_s215w_f216a+e283l
我尝试的是,
def aa_dist(x):
if abs(int(x[1:3]) - int(x[6:8])) < 10:
print re.sub(r'\+', '_', x)
with open(input_file, 'r') as alex:
oligos_list = alex.read()
aa_dist(oligos_list)
这就是我到目前为止所做的。我知道我的代码只会将所有'+'替换为'_',因为它只评估第一对并替换所有。我该怎么做?
答案 0 :(得分:2)
import itertools,re
my_string = "n76a+q80a+l83a+i153a+l203f+r207a+s211a+s215w+f216a+e283l"
#first extract the numbers
my_numbers = map(int,re.findall("[0-9]+",my_string))
#split the string on + (useless comment)
parts = my_string.split("+")
def get_filler((a,b)):
'''this method decides on the joiner'''
return "_" if abs(a-b) < 10 else '+'
fillers = map(get_filler,zip(my_numbers,my_numbers[1:])) #figure out what fillers we need
print "".join(itertools.chain.from_iterable(zip(parts,fillers)))+parts[-1] #it will always skip the last part so gotta add it
是你实现这一目标的一种方式......也是一个无价值评论的例子
答案 1 :(得分:1)
仅通过re
模块。
>>> s = 'n76a+q80a+l83a+i153a+l203f+r207a+s211a+s215w+f216a+e283l'
>>> m = re.findall(r'(?=\b([^+]+\+[^+]+))', s) # This regex would helps to do a overlapping match. See the demo (https://regex101.com/r/jO6zT2/13)
>>> m
['n76a+q80a', 'q80a+l83a', 'l83a+i153a', 'i153a+l203f', 'l203f+r207a', 'r207a+s211a', 's211a+s215w', 's215w+f216a', 'f216a+e283l']
>>> l = []
>>> for i in m:
if abs(int(re.search(r'^\D*(\d+)', i).group(1)) - int(re.search(r'^\D*\d+\D*(\d+)', i).group(1))) < 10:
l.append(i.replace('+', '_'))
else:
l.append(i)
>>> re.sub(r'([a-z0-9]+)\1', r'\1',''.join(l))
'n76a_q80a_l83a+i153a+l203f_r207a_s211a_s215w_f216a+e283l'
通过定义单独的函数。
import re
def aa_dist(x):
l = []
m = re.findall(r'(?=\b([^+]+\+[^+]+))', x)
for i in m:
if abs(int(re.search(r'^\D*(\d+)', i).group(1)) - int(re.search(r'^\D*\d+\D*(\d+)', i).group(1))) < 10:
l.append(i.replace('+', '_'))
else:
l.append(i)
return re.sub(r'([a-z0-9]+)\1', r'\1',''.join(l))
string = 'n76a+q80a+l83a+i153a+l203f+r207a+s211a+s215w+f216a+e283l'
print aa_dist(string)
<强>输出:强>
n76a_q80a_l83a+i153a+l203f_r207a_s211a_s215w_f216a+e283l