我需要从文件中读取,直到遇到某个字符而不存储整行。
我试过了:
def read_one_fasta_entry(fStream) :
s = '' # temp var
while (s != '>') : # '>' is the char to read until and then discard/skip
fStream.read(1)
但是,这只是将程序发送到具有给定输入的无限循环: > fig | 100226.1.peg.1 SCEND.02c,未知,可疑的CDS,len:225aa [Streptomyces coelicolor A3(2)] MTGHHESTGPGTALSSDSTCRVTQYQTAGVNARLRLFALLERRACPRARRTTWWPGRSAR WWSWTAWRRLLGVCCVRGRLGRRRDGGERGPGGHRGPGLATARRRSGGATELAVHCADVR QRERADLVRLEGFVRESVLPRAHPHTTARRRVLEVLGEAGSLCTARTVNSDEDYILCTLG VGHYDPDDQPPFKDGKPGWQRAGASIWNGSGAACIPHAAIEGPRK
有比上面更多的条目,我需要存储ID(图| 10026.1.peg.1)和序列(MTGHHE ...),并且将使用上述方法一次处理一个char因为文件是确定性的('>'在ID之前,''到结束ID,']'在序列之前)但它不起作用。有什么建议吗?
**编辑 我现在已经更新了程序,它似乎在很大程度上起作用,但看起来我被一个'>' 我的模块:
def read_one_fasta_entry(fStream) :
while (True) :
s = ''
while (s != '>') : # Discard first char/extra chars further in the file
s = fStream.read(1)
pegid = ''
while (s != ' ') : # read one char at a time and append to pegid until whitespace
s = fStream.read(1)
pegid += s
protseq = ''
while (s != ']') : # read one char at a time and append to protseq until close square bracket
s = fStream.read(1)
while(s != '>') :
s = fStream.read(1)
protseq += s
yield (pegid, protseq)
驱动:
#!/usr/bin/env python3
import sys
import p3mod
f = open(sys.argv[1])
for (pegid,protseq) in p3mod.read_one_fasta_entry(f):
print(pegid,protseq)
f.close()
有关如何跳过第一个''?的任何想法我是python的新手,但有一个相同的'做... while()'环?这似乎非常有效。
答案 0 :(得分:0)
更新:我明白了!我不得不抵消第一次跳过'>' char(第2行)并检查我是否到达了eof(第19行)。 这是我更新的模块(驱动程序是原始帖子):
def read_one_fasta_entry(fStream) : # Return iterable two-tuples of (pegid, protseq) as long as eof is not reached
s = fStream.read(1) # Offset skipping '>' char
while (True) : # Loop to eof
s = ''
pegid = ''
while (s != ' ') : # Read one char at a time and append to pegid until whitespace
s = fStream.read(1)
pegid += s
protseq = ''
while (s != ']') : # Read one char at a time and append to protseq until close square bracket
s = fStream.read(1)
while(s != '>' and s != '') : # Read until next entry (s != '>') or eof (s != '')
s = fStream.read(1)
if(s != '>') :
protseq += s
if(s == '') : # Check for eof
yield (pegid, protseq)
raise StopIteration() # Close generator
yield (pegid, protseq)