我会为文件夹中的多个fastq文件运行以下代码。在一个文件夹中,我有不同的fastq文件;首先,我必须读取一个文件并执行所需的操作,然后将结果存储在单独的文件中。 fastq然后读取第二个文件,执行相同的操作并将结果保存在新的第二个file.fastq中。对文件夹中的所有文件重复相同的步骤。
我该怎么办?有人可以建议我这样做吗?
from Bio.SeqIO.QualityIO import FastqGeneralIterator
fout=open("prova_FiltraN_CE_filt.fastq","w")
fin=open("prova_FiltraN_CE.fastq","rU")
maxN=0
countall=0
countincl=0
with open("prova_FiltraN_CE.fastq", "rU") as handle:
for (title, sequence, quality) in FastqGeneralIterator(handle):
countN = sequence.count("N", 0, len(sequence))
countall+=1
if countN==maxN:
fout.write("@%s\n%s\n+\n%s\n" % (title, sequence, quality))
countincl+=1
fin.close
fout.close
print countall, countincl
答案 0 :(得分:2)
我认为以下内容会做你想要的。我所做的是将你的代码变成一个函数(并将其修改为我认为更正确的函数),然后为指定文件夹中找到的每个.fastq
文件调用该函数。输出文件名是从找到的输入文件生成的。
from Bio.SeqIO.QualityIO import FastqGeneralIterator
import glob
import os
def process(in_filepath, out_filepath):
maxN = 0
countall = 0
countincl = 0
with open(in_filepath, "rU") as fin:
with open(out_filepath, "w") as fout:
for (title, sequence, quality) in FastqGeneralIterator(fin):
countN = sequence.count("N", 0, len(sequence))
countall += 1
if countN == maxN:
fout.write("@%s\n%s\n+\n%s\n" % (title, sequence, quality))
countincl += 1
print os.path.split(in_filepath)[1], countall, countincl
folder = "/path/to/folder" # folder to process
for in_filepath in glob.glob(os.path.join(folder, "*.fastq")):
root, ext = os.path.splitext(in_filepath)
if not root.endswith("_filt"): # avoid processing existing output files
out_filepath = root + "_filt" + ext
process(in_filepath, out_filepath)