代码:
def readFasta(filename):
""" Reads a sequence in Fasta format """
fp = open(filename, 'rb')
header = ""
seq = ""
while True:
line = fp.readline()
if (line == ""):
break
if (line.startswith(b'>')):
header = line[1:].strip()
else:
seq = fp.read().replace(b'\n',b'')
seq = seq.replace(b'\r',b'') # for windows
break
fp.close()
return (header, seq)
FASTAsequence = readFasta("MusChr01.fa")
''之前的b是必要的,因为我处于字节模式。问题是,运行时,fp.read.replace和seq.replace会删除字符串中的所有内容。我知道读取工作正常,因为
def readFasta(filename):
""" Reads a sequence in Fasta format """
fp = open(filename, 'rb')
header = ""
seq = ""
while True:
line = fp.readline()
if (line == ""):
break
if (line.startswith(b'>')):
header = line[1:].strip()
else:
seq = fp.read()
break
fp.close()
return (header, seq)
FASTAsequence = readFasta("MusChr01.fa")
完美无缺。这是怎么回事?
答案 0 :(得分:0)
这是编写函数的更简洁方法。不知道为什么它不适合你但是
def readFasta(filename):
""" Reads a sequence in Fasta format """
header = seq = b""
with open(filename, 'rb') as fp:
for line in fp:
if not line:
break
if line.startswith(b'>')):
header = line[1:].strip()
else:
seq = fp.read().translate(None, b'\r\n')
break
return (header, seq)
答案 1 :(得分:0)
在else
块中,代码不考虑line
。试试以下。
def readFasta(filename):
header = b""
seq = b""
with open(filename, 'rb') as fp:
while True:
line = fp.readline()
if not line:
break
if line.startswith(b'>'):
header = line[1:].strip()
else:
seq = line + fp.read() # <--- without `line +`, you lose a line.
seq = seq.translate(None, b'\r\n')
break
return header, seq
a sample sequence from wikipedia示例:
>>> with open('mchu.fasta', 'rb') as f: print(f.read().decode('ascii'))
...
>MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken
ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTID
FPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREA
DIDGDGQVNYEEFVQMMTAK*
>>> readFasta('mchu.fasta')
(b'MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken', b'ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK*')