我在一个文件夹中有30个fastq文件,我想知道在哪个文件中我可以找到一个特定的适配器(所以我可以找出它实际上是哪个样本)。
我写了一个很小的biopython脚本,但它只允许我一次查看一个文件,我想同时计算每个文件的出现次数。有人可以帮我改进脚本吗?
from Bio import SeqIO
adaptor = (rec for rec in \
SeqIO.parse("file.fastq", "fastq") \
if rec.seq.startswith("TGA"))`
count = SeqIO.write(adaptor, "adaptor.fastq", "fastq")
print("Saved %i adaptor" % count)
答案 0 :(得分:1)
from Bio import SeqIO
fnames = ["file.fastq", "file1.fastq", "file2.fastq"]
for fname in fnames:
adaptor = (rec for rec in \
SeqIO.parse(fname, "fastq") \
if rec.seq.startswith("TGA"))
count = SeqIO.write(adaptor, "adaptor.fastq", "fastq")
print("Saved %i adaptor in file %s" %(count, fname))