如何将qblast XML输出转换为NCBI BLAST -outfmt 17?

时间:2016-08-10 23:53:08

标签: format biopython blast

我使用NCBI独立BLAST启动了我的项目,并使用了-outfmt 17选项。为了我的目的,格式化非常有用。但是,我不得不改为Biopython,现在我正在使用qblast将我的序列与NCBI NT数据库对齐。我能否以与NCBI BLAST独立-outfmt 17格式相当的格式保存/转换qblast XML?

非常感谢你的帮助!

干杯, 菲利普

1 个答案:

答案 0 :(得分:0)

我假设你的意思是-outfmt 7,你需要一个带列的输出。

from Bio.Blast import NCBIWWW, NCBIXML

# This is the BLASTN query which returns an XML handler in a StringIO
r = NCBIWWW.qblast(
    "blastn",
    "nr",
    "ACGGGGTCTCGAAAAAAGGAGAATGGGATGAGAAGGATATATGGGTAGTGTCATTTTTTAACTTGCAGAT" +
    "TTCATCCTAGTCTTCCAGTTATCGTTTCCTAGCACTCCATGTTCCCAAGATAGTGTCACCACCCCAAGGA" +
    "CTCTCTCTCATTTTCTTTGCCTGGGCCCTCTTTCTACTGAGGAGTCGTGGCCTTCCATCAGTAGAAGCCG",
    expect=1E-5)

# Now we read that XML extracting the info

for record in NCBIXML.parse(r):
    for alignment in record.alignments:
        for hsp in alignment.hsps:
            cols = "{}\t" * 10
            print(cols.format(hsp.positives / hsp.align_length,
                              hsp.align_length,
                              hsp.align_length - hsp.positives,
                              hsp.gaps,
                              hsp.query_start,
                              hsp.query_end,
                              hsp.sbjct_start,
                              hsp.sbjct_end,
                              hsp.expect,
                              hsp.score))

输出类似:

1   210 0   0   1   210 89250   89459   8.73028e-102    420.0   
0   206 19  2   5   210 46259   46462   5.16461e-73 314.0   
1   210 0   0   1   210 68822   69031   8.73028e-102    420.0   
0   206 19  2   5   210 25825   26028   5.16461e-73 314.0   
1   210 0   0   1   210 65887   66096   8.73028e-102    420.0   
...