Python文件一次处理一个字符

时间:2016-06-15 20:52:39

标签: file python-3.x io

我需要从文件中读取,直到遇到某个字符而不存储整行。

我试过了:

def read_one_fasta_entry(fStream) :
    s = '' # temp var
    while (s != '>') : # '>' is the char to read until and then discard/skip
        fStream.read(1)

但是,这只是将程序发送到具有给定输入的无限循环: > fig | 100226.1.peg.1 SCEND.02c,未知,可疑的CDS,len:225aa [Streptomyces coelicolor A3(2)] MTGHHESTGPGTALSSDSTCRVTQYQTAGVNARLRLFALLERRACPRARRTTWWPGRSAR WWSWTAWRRLLGVCCVRGRLGRRRDGGERGPGGHRGPGLATARRRSGGATELAVHCADVR QRERADLVRLEGFVRESVLPRAHPHTTARRRVLEVLGEAGSLCTARTVNSDEDYILCTLG VGHYDPDDQPPFKDGKPGWQRAGASIWNGSGAACIPHAAIEGPRK

有比上面更多的条目,我需要存储ID(图| 10026.1.peg.1)和序列(MTGHHE ...),并且将使用上述方法一次处理一个char因为文件是确定性的('>'在ID之前,''到结束ID,']'在序列之前)但它不起作用。有什么建议吗?

**编辑 我现在已经更新了程序,它似乎在很大程度上起作用,但看起来我被一个'>' 我的模块:

def read_one_fasta_entry(fStream) :
    while (True) :
        s = ''
        while (s != '>') : # Discard first char/extra chars further in the file
            s = fStream.read(1)

        pegid = ''
        while (s != ' ') : # read one char at a time and append to pegid until whitespace
            s = fStream.read(1)
            pegid += s

        protseq = ''
        while (s != ']') : # read one char at a time and append to protseq until close square bracket
            s = fStream.read(1)

        while(s != '>') :
            s = fStream.read(1)
            protseq += s

        yield (pegid, protseq)

驱动:

#!/usr/bin/env python3

import sys

import p3mod


f = open(sys.argv[1])
for (pegid,protseq) in p3mod.read_one_fasta_entry(f):
    print(pegid,protseq)
f.close()

有关如何跳过第一个''?的任何想法我是python的新手,但有一个相同的'做... while()'环?这似乎非常有效。

1 个答案:

答案 0 :(得分:0)

更新:我明白了!我不得不抵消第一次跳过'>' char(第2行)并检查我是否到达了eof(第19行)。 这是我更新的模块(驱动程序是原始帖子):

def read_one_fasta_entry(fStream) : # Return iterable two-tuples of (pegid, protseq) as long as eof is not reached
    s = fStream.read(1) # Offset skipping '>' char
    while (True) : # Loop to eof
        s = ''
        pegid = ''

        while (s != ' ') : # Read one char at a time and append to pegid until whitespace
            s = fStream.read(1)
            pegid += s

        protseq = ''
        while (s != ']') : # Read one char at a time and append to protseq until close square bracket
            s = fStream.read(1)

        while(s != '>' and s != '') : # Read until next entry (s != '>') or eof (s != '')
            s = fStream.read(1)
            if(s != '>') :
                protseq += s

        if(s == '') : # Check for eof
            yield (pegid, protseq)
            raise StopIteration() # Close generator

        yield (pegid, protseq)