输入文件:
>sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=3 SV=3
MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS
WRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFY
>sp|P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1
MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASW
RIISSIEQKEENKGGEDKLKMIREYRQMVETELKLICCDILDVLDKHLIPAANTGESKVF
预期产出:
sp | P62258 | 1433E_HUMAN 14-3-3 protein epsilon OS = Homo sapiens GN = YWHAE PE = 1 SV = 1 MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASW RIISSIEQKEENKGGEDKLKMIREYRQMVETELKLICCDILDVLDKHLIPAANTGESKVF
到目前为止编写的代码:
#!/usr/bin/python
import re
fh = open("test_seq")
for line in fh:
if line.startswith('>'):
if re.search('PE=1',line):
print line
答案 0 :(得分:0)
将这些线附加在一起然后它们将作为单个字符串运行。
答案 1 :(得分:0)
怎么样:
with open("test_seq", 'rb') as fh:
print_line = False
for line in fh:
if line.startswith('>'):
if re.search('PE=1',line):
print_line = True
else:
print_line = False
if print_line:
print line