我的输入文件是这样的:
>seq_1
ATTAGACCTG
>seq_2
CCTGCCGGAA
>seq_3
AGACCTGCCG
>seq_4
GCCGGAATAC
我已编写此代码以获取常见的超字符串:
from itertools import permutations
def parse_fasta (lines):
descs = []
seqs = []
data = ''
for line in lines:
if line.startswith('>'):
if data:
seqs.append(data)
data = ''
descs.append(line)
else:
data += line.rstrip('\r\n')
seqs.append(data)
return descs, seqs
descriptions, sequences = parse_fasta(open('D:\python\input.fasta', 'r').read().split('\n'))
def solve(*strings):
"""
Given a list of strings, return the shortest string that contains them all.
"""
return min((simplify(p) for p in permutations(strings)), key=len)
def prefixes(s):
"""
Return a list of all the prefixes of the given string (including itself),
in ascending order (from shortest to longest).
"""
return [s[:i+1] for i in range(len(s))]
return [(i,s[:i+1]) for i in range(len(s))][::-1]
def simplify(strings):
"""
Given a list of strings, concatenate them wile removing overlaps between
successive elements.
"""
ret = ''
for s in strings:
if s in ret:
continue
for i, prefix in reversed(list(enumerate(prefixes(s)))):
if ret.endswith(prefix):
ret += s[i+1:]
break
else:
ret += s
return ret
print solve(sequences)
我收到标题中显示的错误。我何时给予
print solve('ATTAGACCTG','CCTGCCGGAA','AGACCTGCCG','GCCGGAATAC')
我得到了正确的输出。但我想把我的输入作为程序中显示的文件。并希望得到这样的结果:
ATTAGACCTGCCGGAATAC