示例文件如下所示:
['>1\n', 'TCCGGGGGTATC\n', '>2\n', 'TCCGTGGGTATC\n',
'>3\n', 'TCCGTGGGTATC\n', '>4\n', 'TCCGGGGGTATC\n',
'>5\n', 'TCCGTGGGTATC\n', '>6\n', 'TCCGTGGGTATC\n',
'>7\n', 'TCCGTGGGTATC\n', '>8\n', 'TCCGGGGGTATC\n','\n',
'$$$\n', '\n',
'>B1\n', 'ATCGGGGGTATT\n', '>B2\n', 'TT-GTGGGAATC\n',
'>3\n', 'TTCGTGGGAATC\n', '>B4\n', 'TT-GTGGGTATC\n',
'>B5\n', 'TTCGTGGGTATT\n', '>B6\n','TTCGGGGGTATC\n',
'>B7\n', 'TT-GTGGGTATC\n', '>B8\n', 'TTCGGGGGAATC\n',
'>B9\n', 'TTCGGGGGTATC\n','>B10\n', 'TTCGGGGGTATC\n',
'>B42\n', 'TT-GTGGGTATC\n']
$$$
分隔两组。我需要使用.strip
函数并删除\n
和所有“标题”。
我需要制作一个列表(如下所示)并用Z
替换“ - ” [ 'TCCGGGGGTATC','TCCGTGGGTATC','TCCGTGGGTATC', 'TCCGGGGGTATC',
'TCCGTGGGTATC','TCCGTGGGTATC','TCCGTGGGTATC', 'TCCGGGGGTATC',
'ATCGGGGGTATT','TT-GTGGGAATC','TTCGTGGGAATC', 'TT-GTGGGTATC',
'TTCGTGGGTATT','TTCGGGGGTATC','TT-GTGGGTATC', 'TTCGGGGGAATC',
'TTCGGGGGTATC','TTCGGGGGTATC','TT-GTGGGTATC']
以下是代码(https://stackoverflow.com/a/39965048/6820344)的链接,其中回答了类似的问题。我试图修改代码以获得上面提到的输出。但是,我没有“$$$”的列表。另外,我需要一个列表,而不是列表列表。
seq_list = []
for x in lst:
if x.startswith('>'):
seq_list.append([])
continue
x = x.strip()
if x:
seq_list[-1].append(x.replace("-", "Z"))
print(seq_list)
答案 0 :(得分:1)
input = ['>1\n', 'TCCGGGGGTATC\n', '>2\n', 'TCCGTGGGTATC\n',
'>3\n', 'TCCGTGGGTATC\n', '>4\n', 'TCCGGGGGTATC\n',
'>5\n', 'TCCGTGGGTATC\n', '>6\n', 'TCCGTGGGTATC\n',
'>7\n', 'TCCGTGGGTATC\n', '>8\n', 'TCCGGGGGTATC\n', '\n',
'$$$\n', '\n',
'>B1\n', 'ATCGGGGGTATT\n', '>B2\n', 'TT-GTGGGAATC\n',
'>3\n', 'TTCGTGGGAATC\n', '>B4\n', 'TT-GTGGGTATC\n',
'>B5\n', 'TTCGTGGGTATT\n', '>B6\n', 'TTCGGGGGTATC\n',
'>B7\n', 'TT-GTGGGTATC\n', '>B8\n', 'TTCGGGGGAATC\n',
'>B9\n', 'TTCGGGGGTATC\n', '>B10\n', 'TTCGGGGGTATC\n',
'>B42\n', 'TT-GTGGGTATC\n']
output = []
for elem in input:
if elem.startswith('>') or \
elem.startswith('$') or \
elem.isspace():
continue
output.append(elem.replace('-', 'Z').strip())
from pprint import pprint
pprint(output, compact=True)
运行上述代码时,结果如下:
['TCCGGGGGTATC', 'TCCGTGGGTATC', 'TCCGTGGGTATC', 'TCCGGGGGTATC', 'TCCGTGGGTATC',
'TCCGTGGGTATC', 'TCCGTGGGTATC', 'TCCGGGGGTATC', 'ATCGGGGGTATT', 'TTZGTGGGAATC',
'TTCGTGGGAATC', 'TTZGTGGGTATC', 'TTCGTGGGTATT', 'TTCGGGGGTATC', 'TTZGTGGGTATC',
'TTCGGGGGAATC', 'TTCGGGGGTATC', 'TTCGGGGGTATC', 'TTZGTGGGTATC']