Question

我想在一堆大序列中寻找开放阅读框架。因此我使用BioPython的ORF_finder函数。这是完美的，我可以用大于一定大小的ORF打印核苷酸序列，我也可以打印蛋白质序列。

脚本如下所示：

def ORF_Finder(fasta_file, min_length=0, por_n=100):
    table = 11
    min_pro_len = 1000
    min_pro_len2 = 400
    test = 'ORF'
    for record in SeqIO.parse(fasta_file, "fasta"):
        print record
        min_pro_len = 100
        for strand, nuc in [(+1, record.seq), (-1, record.seq.reverse_complement())]:
            for frame in range(3):
                length = 3 * ((len(record) - frame) // 3)  # Multiple of three
            for pro in nuc[frame:frame + length].translate(table).split("*"):
                if len(nuc) >= 4000:
                    if len(pro) >= min_pro_len:
                        outfile.write('>' + str(record.id) + '\n' + str(pro + '\n'))
                        print("%s...%s - length %i, strand %i, frame %i" \
                              % (pro[:30], pro[-3:], len(pro), strand, frame))

如果我打印record.seq我得到整个序列，但我想要的是这种特定蛋白质的核苷酸序列。

如何获得这些序列？

致以最诚挚的问候，

BAS

为了澄清事情，我使用nt序列作为输入，例如：

TAATAATAGTAGTAATAGATGATGATGATGATGCGACGACGA

然后我运行ORF finder脚本，它可以给我以下氨基酸序列：

 MMMMMRRR

但我对氨基酸序列不感兴趣，但对编码氨基酸的核苷酸序列不感兴趣，例如：

ATGATGATGATGATGCGACGACGA

而且我不知道如何解决这个问题

在python中找到开放的阅读框架

0 个答案: