我是python的新手,我有一个python脚本来运行特定文件(input1.txt)并生成一个输出(output1.fasta),但我想为多个文件运行此脚本,例如:input2.txt,input3.txt ...并生成相应的输出:output2.fasta,output3.fasta
vec[vec==''] <- names(vec)[vec=='']
我尝试添加glob函数,但我不知道如何处理输出文件名。
from Bio import SeqIO
fasta_file = "sequences.txt"
wanted_file = "input1.txt"
result_file = "output1.fasta"
wanted = set()
with open(wanted_file) as f:
for line in f:
line = line.strip()
if line != "":
wanted.add(line)
fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
with open(result_file, "w") as f:
for seq in fasta_sequences:
if seq.id in wanted:
SeqIO.write([seq], f, "fasta")
错误消息为:NameError:未定义名称“result_file”
答案 0 :(得分:3)
您的glob
正在拉动您的&#34;序列&#34;文件以及输入,因为*.txt
包含sequences.txt
文件。如果&#34; fasta&#34;文件总是一样的,你只想迭代输入文件,然后你需要
for filename in glob.glob('input*.txt'):
此外,要遍历整个过程,也许您希望将其放在方法中。如果始终创建输出文件名以对应输入,则可以动态创建。
from Bio import SeqIO
def create_fasta_outputs(fasta_file, wanted_file):
result_file = wanted_file.replace("input","output").replace(".txt",".fasta")
wanted = set()
with open(wanted_file) as f:
for line in f:
line = line.strip()
if line != "":
wanted.add(line)
fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
with open(result_file, "w") as f:
for seq in fasta_sequences:
if seq.id in wanted:
SeqIO.write([seq], f, "fasta")
fasta_file = "sequences.txt"
for wanted_file in glob.glob('input*.txt'):
create_fasta_outputs(fasta_file, wanted_file)