Python - 使用搜索

时间:2017-07-25 21:57:10

标签: python string seek

我是Python的初学者,我正在玩各种方法来完成反向补充DNA或RNA序列的简单任务,以学习一些字符串函数等。我的最新方法几乎可以工作,但对于轻微的刺激,我无法找到答案,可能是因为我正在使用的某些内容我并不理解。 我的函数被设计为写一个空文件(这个工作!),然后打开一个包含序列的文件,一次循环一个字符,将其反向补码写入新文件。这是代码:

def func_rev_seq(in_path,out_path):
"""
Read file one character at a time and retrun the reverse complement of each nucleotide to a new file
"""
#  Write a blank file (out_path)
fb = open(out_path,"w")
fb.write("")
fb.close()
#  Dictionary where the key is the nucleotide and the value is its reverse complement
base = {"A":"T", "C":"G", "G":"C", "T":"A", "a":"t", "c":"g", "g":"c", "t":"a", "k":"m", "m":"k", "y":"r", "r":"y", "b":"v", "v":"b", "d":"h", "h":"d", "K":"M", "M":"K", "Y":"R", "R":"Y", "B":"V", "V":"B", "D":"H", "H":"D", "U":"A", "u":"a"} 
#  Open the source file (in_path) as fi
fi=open(in_path,"r")
i = fi.read(1)
#  Loop through the source file one character at a time and write the reverse complement to the output file
while i != "":
    i = fi.read(1)
    if i in base:
        b = base[i]   
    else:
        b = i
    with open(out_path, 'r+') as fo:
        body = fo.read()
        fo.seek(0, 0)
        fo.write(b + body)        
fi.close()
fo.close()

问题在于,当我运行该函数时,输出文件中的字符串首先被单个字符截断,其次是低于我不想要的空白行。 screen shot of input and output file examples 据我所知,带有(0,0)的搜索功能应该引用文件的开头,但我可能会误解。 非常感谢任何帮助,谢谢!

2 个答案:

答案 0 :(得分:0)

当您放置i = fi.read(1)时,i等于文件中的第一个字符,但在while循环的开头,您将第二个字符分配给i相同的陈述,没有对第一个字符做任何事情。如果你想在没有这个问题的情况下遍历文件中的每个字符,最好使用for循环。反过来逐个字符迭代有点挑战,但这有效:

def nucleo_complement(ifilename, ofilename):
    """Reads a file one character at a time and returns the reverse
    complement of each nucleotide."""
    complements = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C'}
    ifile = open(ifilename)
    ofile = open(ofilename, 'w')
    for pos in range(ifile.seek(0, 2) + 1, 0, -1):
        ifile.seek(pos - 1)
        char = ifile.read(1)
        ofile.write(complements.get(char.upper(), char))
    ifile.close()
    ofile.close()

seek返回新文件位置,seek(0, 2)转到文件中的最后一个字符。每当你调用read(1)时,文件中的位置前进一个字符,所以我必须让pos最初等于最后一个字符的位置再加上一个,然后我的循环结束第二个字符而不是首先。对于每次迭代,我使用ifile.seek(pos - 1')返回一个字符,然后读取下一个(原始)字符。作为初学者,这个例子可能有点多,所以如果您有任何疑问,请随时提出。实际上,您需要考虑的是for循环中的前两个语句,以及我同时打开这两个文件的事实。

答案 1 :(得分:0)

这是工作代码,感谢Issac。它解决了我遇到的两个问题。

def func_rev_seq(in_path,out_path):
    """Read file one character at a time and retrun the reverse complement of each nucleotide to a new file"""

    #  Write a blank file (out_path)
    fb = open(out_path,"w")
    fb.write("")
    fb.close()
    #  Dictionary where the key is the nucleotide and the value is its reverse complement
    base = {"A":"T", "C":"G", "G":"C", "T":"A", "a":"t", "c":"g", "g":"c", "t":"a", "k":"m", "m":"k", "y":"r", "r":"y", "b":"v", "v":"b", "d":"h", "h":"d", "K":"M", "M":"K", "Y":"R", "R":"Y", "B":"V", "V":"B", "D":"H", "H":"D", "U":"A", "u":"a"} 
    fi= open(in_path)
    fo = open(out_path, 'w')

    for pos in range(fi.seek(0, 2) - 1,  0, -1):
        fi.seek(pos - 1)
        b = fi.read(1)
        if b in base:
            fo.write(base.get(b, b))
        else:
            fo.write(b)
    fi.close()
    fo.close()