如何为多个fastq文件运行代码?

时间:2015-10-14 15:41:45

标签: python biopython

我会为文件夹中的多个fastq文件运行以下代码。在一个文件夹中,我有不同的fastq文件;首先,我必须读取一个文件并执行所需的操作,然后将结果存储在单独的文件中。 fastq然后读取第二个文件,执行相同的操作并将结果保存在新的第二个file.fastq中。对文件夹中的所有文件重复相同的步骤。

我该怎么办?有人可以建议我这样做吗?

from Bio.SeqIO.QualityIO import FastqGeneralIterator
fout=open("prova_FiltraN_CE_filt.fastq","w")
fin=open("prova_FiltraN_CE.fastq","rU")
maxN=0
countall=0
countincl=0
with open("prova_FiltraN_CE.fastq", "rU") as handle:
    for (title, sequence, quality) in FastqGeneralIterator(handle):
        countN = sequence.count("N", 0, len(sequence))
        countall+=1
        if countN==maxN:
            fout.write("@%s\n%s\n+\n%s\n" % (title, sequence, quality))
            countincl+=1
fin.close
fout.close
print countall, countincl

1 个答案:

答案 0 :(得分:2)

我认为以下内容会做你想要的。我所做的是将你的代码变成一个函数(并将其修改为我认为更正确的函数),然后为指定文件夹中找到的每个.fastq文件调用该函数。输出文件名是从找到的输入文件生成的。

from Bio.SeqIO.QualityIO import FastqGeneralIterator
import glob
import os

def process(in_filepath, out_filepath):
    maxN = 0
    countall = 0
    countincl = 0
    with open(in_filepath, "rU") as fin:
        with open(out_filepath, "w") as fout:
            for (title, sequence, quality) in FastqGeneralIterator(fin):
                countN = sequence.count("N", 0, len(sequence))
                countall += 1
                if countN == maxN:
                    fout.write("@%s\n%s\n+\n%s\n" % (title, sequence, quality))
                    countincl += 1
    print os.path.split(in_filepath)[1], countall, countincl

folder = "/path/to/folder"  # folder to process
for in_filepath in glob.glob(os.path.join(folder, "*.fastq")):
    root, ext = os.path.splitext(in_filepath)
    if not root.endswith("_filt"):  # avoid processing existing output files
        out_filepath = root + "_filt" + ext
        process(in_filepath, out_filepath)