Question

我正在尝试从单个FASTA文件中运行多个序列的BLASTN搜索。我可以轻松地从文件中查询单个序列，但我很难查询一个文件中的所有序列。由于这些是相对较短的读取，我宁愿不将文件拆分成单独的序列并分别查询每个序列。

这是我到目前为止所尝试的：

from Bio import SeqIO
from Bio.Blast import NCBIWWW

f_iterator = SeqIO.parse("file.fasta", "fasta")
f_record = f_iterator.next()
result_handle = NCBIWWW.qblast("blastn", "nt", f_record)
save_result = open("blast_result.xml", "w")
save_result.write(result_handle.read())
save_result.close()
result_handle.close()

有人有什么想法吗？

Answer 1

如果您的文件已经是FASTA格式，您可以使用打开/读取。这是直接从Biopython食谱中获取的。

http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc92

fasta_string = open("m_cold.fasta").read()

我一直运行这样一个简单的脚本：

from Bio.Blast import NCBIWWW

fasta_string = open("file.fasta").read()

result_handle = qblast(
"blastn",
"nt",
fasta_string,
)
save_file = open("out.xml", "w")

save_file.write(result_handle.read())

save_file.close()

result_handle.close()

如果这不起作用，请检查以确保您的FASTA格式正确无误。这里有转换器。

https://www.hiv.lanl.gov/content/sequence/FORMAT_CONVERSION/form.html

Answer 2

难道你不能简单地给出多序列fasta文件的全部内容（直接从文件中读取）而不是单个记录吗？

    from Bio.Blast import NCBIWWW

    with open("file.fasta", "r") as fasta_file:
        sequences = fasta_file.read()
        fasta_file.close()

    result_handle = NCBIWWW.qblast("blastn", "nt", sequences)
    save_result = open("blast_result.xml", "w")
    save_result.write(result_handle.read())
    save_result.close()
    result_handle.close()

如何使用Biopython将多个序列上传到BLAST？

2 个答案: