Question

我需要从文件中读取，直到遇到某个字符而不存储整行。

我试过了：

def read_one_fasta_entry(fStream) :
    s = '' # temp var
    while (s != '>') : # '>' is the char to read until and then discard/skip
        fStream.read(1)

但是，这只是将程序发送到具有给定输入的无限循环：＆gt; fig | 100226.1.peg.1 SCEND.02c，未知，可疑的CDS，len：225aa [Streptomyces coelicolor A3（2）] MTGHHESTGPGTALSSDSTCRVTQYQTAGVNARLRLFALLERRACPRARRTTWWPGRSAR WWSWTAWRRLLGVCCVRGRLGRRRDGGERGPGGHRGPGLATARRRSGGATELAVHCADVR QRERADLVRLEGFVRESVLPRAHPHTTARRRVLEVLGEAGSLCTARTVNSDEDYILCTLG VGHYDPDDQPPFKDGKPGWQRAGASIWNGSGAACIPHAAIEGPRK

有比上面更多的条目，我需要存储ID（图| 10026.1.peg.1）和序列（MTGHHE ...），并且将使用上述方法一次处理一个char因为文件是确定性的（＆＃39;＆gt;＆＃39;在ID之前，＆＃39;＆＃39;到结束ID，＆＃39;]＆＃39;在序列之前）但它不起作用。有什么建议吗？

**编辑我现在已经更新了程序，它似乎在很大程度上起作用，但看起来我被一个＆＃39;＆gt;＆＃39; 我的模块：

def read_one_fasta_entry(fStream) :
    while (True) :
        s = ''
        while (s != '>') : # Discard first char/extra chars further in the file
            s = fStream.read(1)

        pegid = ''
        while (s != ' ') : # read one char at a time and append to pegid until whitespace
            s = fStream.read(1)
            pegid += s

        protseq = ''
        while (s != ']') : # read one char at a time and append to protseq until close square bracket
            s = fStream.read(1)

        while(s != '>') :
            s = fStream.read(1)
            protseq += s

        yield (pegid, protseq)

驱动：

#!/usr/bin/env python3

import sys

import p3mod


f = open(sys.argv[1])
for (pegid,protseq) in p3mod.read_one_fasta_entry(f):
    print(pegid,protseq)
f.close()

有关如何跳过第一个＆＃39;＆＃39;？的任何想法我是python的新手，但有一个相同的＆＃39;做... while（）＆＃39;环？这似乎非常有效。

Answer 1

更新：我明白了！我不得不抵消第一次跳过＆＃39;＆gt;＆＃39; char（第2行）并检查我是否到达了eof（第19行）。这是我更新的模块（驱动程序是原始帖子）：

def read_one_fasta_entry(fStream) : # Return iterable two-tuples of (pegid, protseq) as long as eof is not reached
    s = fStream.read(1) # Offset skipping '>' char
    while (True) : # Loop to eof
        s = ''
        pegid = ''

        while (s != ' ') : # Read one char at a time and append to pegid until whitespace
            s = fStream.read(1)
            pegid += s

        protseq = ''
        while (s != ']') : # Read one char at a time and append to protseq until close square bracket
            s = fStream.read(1)

        while(s != '>' and s != '') : # Read until next entry (s != '>') or eof (s != '')
            s = fStream.read(1)
            if(s != '>') :
                protseq += s

        if(s == '') : # Check for eof
            yield (pegid, protseq)
            raise StopIteration() # Close generator

        yield (pegid, protseq)

Python文件一次处理一个字符

1 个答案: