如何在字节模式python中使用字符串替换

时间:2013-11-07 06:12:18

标签: python

代码:

def readFasta(filename):
    """ Reads a sequence in Fasta format """
    fp = open(filename, 'rb')
    header = ""
    seq = ""
    while True:
        line = fp.readline()
        if (line == ""):
            break
        if (line.startswith(b'>')):
            header = line[1:].strip()
        else:
            seq = fp.read().replace(b'\n',b'')
            seq = seq.replace(b'\r',b'')          # for windows
            break
    fp.close()
    return (header, seq)

FASTAsequence = readFasta("MusChr01.fa")

''之前的b是必要的,因为我处于字节模式。问题是,运行时,fp.read.replace和seq.replace会删除字符串中的所有内容。我知道读取工作正常,因为

def readFasta(filename):
    """ Reads a sequence in Fasta format """
    fp = open(filename, 'rb')
    header = ""
    seq = ""
    while True:
        line = fp.readline()
        if (line == ""):
            break
        if (line.startswith(b'>')):
            header = line[1:].strip()
        else:
            seq = fp.read()
            break
    fp.close()
    return (header, seq)

FASTAsequence = readFasta("MusChr01.fa")

完美无缺。这是怎么回事?

2 个答案:

答案 0 :(得分:0)

这是编写函数的更简洁方法。不知道为什么它不适合你但是

def readFasta(filename):
    """ Reads a sequence in Fasta format """
    header = seq = b""
    with open(filename, 'rb') as fp:
        for line in fp:
            if not line:
                break
            if line.startswith(b'>')):
                header = line[1:].strip()
            else:
                seq = fp.read().translate(None, b'\r\n')
                break
    return (header, seq)

答案 1 :(得分:0)

else块中,代码不考虑line。试试以下。

def readFasta(filename):
    header = b""
    seq = b""
    with open(filename, 'rb') as fp:
        while True:
            line = fp.readline()
            if not line:
                break
            if line.startswith(b'>'):
                header = line[1:].strip()
            else:
                seq = line + fp.read() # <--- without `line +`, you lose a line.
                seq = seq.translate(None, b'\r\n')
                break
    return header, seq

a sample sequence from wikipedia示例:

>>> with open('mchu.fasta', 'rb') as f: print(f.read().decode('ascii'))
... 
>MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken
ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTID
FPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREA
DIDGDGQVNYEEFVQMMTAK*

>>> readFasta('mchu.fasta')
(b'MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken', b'ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK*')