Question

编辑2/18：我发现了问题。这不是代码直接，虽然有人指出我提出的这个样本不是我应该提出的方式。我道歉！问题是blastx的结果。他们没有达到阈值设置，代码创建空文件，然后才意识到它没有令人满意的结果写入文件。谢谢你的考虑。

我一直在使用Biopython在本地运行一些blastx搜索，将DNA查询修剪到它找到的ORF，然后将新序列保存到fasta文件中。在一批450个序列中，似乎跳过其中的43个。它总是跳过相同的43个序列。跳过的序列不仅在列表的开头或结尾，不是很短，所有序列输入都是fasta格式。我已经检查了BLAST XML输出文件中的几个序列，最终没有将输出序列写入文件，并列出并显示了匹配。应该根据blastx XML输出查找和编写修剪序列的代码部分如下所示。

from Bio.Blast import NCBIXML
from Bio.Seq import Seq
from Bio.SeqFeature import SeqFeature, FeatureLocation

id_list = ".../ORFS.txt"
text_file = open(id_list)
list1 = text_file.read().split('\n')
list2 = []
for items in list1:
list2.append(items)
list2.pop()

access_to_orf = {}
for orf_accession in list2:
access_split = orf_accession.split(".")[0]
access_to_orf[access_split] = orf_accession

parse_in = os.path.join(out_path, out_name)
parse_name = accession + ".trim.fasta"
parse_out = os.path.join(trim_out, parse_name)
file = open(parse_out, "w")
parsed_data = ""
result_handle = open(parse_in)
blast_records = NCBIXML.parse(result_handle)
blast_record = next(blast_records)
q_record = SeqIO.read(seq_path, "fasta")
q_parse = q_record.seq
    for alignment in blast_record.alignments:
        for hsp in alignment.hsps:
            if (hsp.positives/alignment.length)*100 >= 95:
                fr = hsp.frame
                x, y = fr
                if x < 0:
                    dna_str = -1
                else:
                    dna_str = 1
                feature = SeqFeature(
                    FeatureLocation((hsp.query_start-1), (hsp.query_end), 
                        strand = dna_str))
                q_feature = feature.extract(q_parse)
                parsed_data += (blast_record.query + " " + str(q_feature)  +"\n")
file.write(parsed_data)
file.close()
print("Finished BLAST parse for " + ORF)

任何人都可以帮我弄清楚它为什么会跳过序列吗？

编辑2/12：添加了关于加入和ORF定义位置的样本部分。

Python脚本会跳过将修剪过的DNA序列写入文件

0 个答案: