如何解决此错误:“ BiopythonWarning:部分密码子,len(序列)不是三个的倍数。”?

时间:2018-12-22 09:41:55

标签: python biopython

对于一项作业,我需要编写一个代码,将rna序列f从fasta文件转换为氨基酸序列。但是,我不断收到以下警告消息: “ BiopythonWarning:部分密码子,len(序列)不能为3的倍数。显式修饰序列或在翻译前添加尾随N。将来可能会出错。”

我试图添加尾随N,但它似乎仍然不起作用。我认为我的代码中可能有一个错误,但是我不确定在哪里。

这是我的代码:

from Bio.Seq import Seq
from Bio import SeqIO
seq_records = SeqIO.parse('rna.fasta', 'fasta')
amino_acids1 = []
amino_acids2 = []
amino_acids3 = []

for record in seq_records:

# starting from nucleotide 1
if len(record) %3 ==0:
     amino_acids1.append(record.translate())
elif (len(record)+1) %3 ==0:
    recordN = record + Seq('N')
    amino_acids1.append(recordN.translate())
elif (len(record)+2) %3 ==0:   
    recordNN = record + Seq('N') + Seq('N')
    amino_acids1.append(recordNN.translate())
print("FIRST")
print(amino_acids1)
with open('rna_out.fasta', 'w') as p_file: 
    SeqIO.write(amino_acids1, p_file, 'fasta')


# starting from nucleotide 2
record2 = record[1:]
if len(record2) %3 ==0:
     amino_acids2.append(record2.translate())
elif (len(record2)+1) %3 ==0:
    record2N = record + Seq('N')
    amino_acids2.append(record2N.translate())
elif (len(record2)+2) %3 ==0:   
    record2NN = record + Seq('N') + Seq('N')
    amino_acids2.append(record2NN.translate() )
print("SECOND")
print(amino_acids2)
with open('rna_out.fasta', 'w') as p_file: 
    SeqIO.write(amino_acids2, p_file, 'fasta')


# starting from nucleotide 3
record3 = record[2:]
if len(record3) %3 ==0:
    amino_acids3.append(record3.translate())
elif (len(record3)+1) %3 ==0:
    record3N = record + Seq('N')
    amino_acids3.append(record3N.translate())
elif (len(record3)+2) %3 ==0:
    record3NN = record + Seq('N') + Seq('N')
    amino_acids3.append(record3NN.translate())
print("THIRD")
print(amino_acids3)
with open('rna_out.fasta', 'w') as p_file: 
    SeqIO.write(amino_acids3, p_file, 'fasta')

通常,这将为fasta文件中的每个序列提供3种可能的翻译。但是,输出似乎不正确。

这些是前3行,应该是fasta文件中第一个序列的3种不同翻译:

第一 [SeqRecord(seq = Seq('GAKRTDRT S VINKLSLLYTSCETIDCYIFFL',HasStopCodon(ExtendedIUPACProtein(),'')),id ='',name ='',description ='',dbxrefs = [])] 第二 [SeqRecord(seq = Seq('GAKRTDRT S VINKLSLLYTSCETIDCYIFFL',HasStopCodon(ExtendedIUPACProtein(),'')),id ='',name ='',description ='', dbxrefs = [])] 第三 [SeqRecord(seq = Seq('CQKN SDVVVGH QTVVALHVMRND LLYLFP',HasStopCodon(ExtendedIUPACProtein(),'')),id =“,name =”,描述='',dbxrefs = [])]

我不知道哪里出了问题,但这绝对不是正确的翻译。如果您知道我犯了一个错误,我将非常感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

您的方法可能有效,但是您的代码中存在复制和粘贴错误:

record2 = record[1:]
if len(record2) %3 ==0:
     amino_acids2.append(record2.translate())
elif (len(record2)+1) %3 ==0:
    record2N = record + Seq('N')

请注意,最后一行的record应该是record2。您至少四次犯此错误。我相信@Chris_Rands代码可以指导您深入了解该问题,例如也可以翻译反向补码,但是我不建议在该代码中使用pad_seq()函数。

下面是集成到您的代码中的pad_seq()的重做:

from Bio.Seq import Seq
from Bio import SeqIO

def pad_seq(sequence):
    """ Pad sequence to multiple of 3 with N """

    remainder = len(sequence) % 3

    return sequence if remainder == 0 else sequence + Seq('N' * (3 - remainder))

seq_records = SeqIO.parse('rna.fasta', 'fasta')

amino_acids1 = []
amino_acids2 = []
amino_acids3 = []

for record in seq_records:

    # starting from nucleotide 1
    amino_acids1.append(pad_seq(record).translate())
    print("FIRST")
    print(amino_acids1)
    # ...

    # starting from nucleotide 2
    record2 = record[1:]
    amino_acids2.append(pad_seq(record2).translate())
    print("SECOND")
    print(amino_acids2)
    # ...

    # starting from nucleotide 3
    record3 = record[2:]
    amino_acids3.append(pad_seq(record3).translate())
    print("THIRD")
    print(amino_acids3)
    # ...