对于一项作业,我需要编写一个代码,将rna序列f从fasta文件转换为氨基酸序列。但是,我不断收到以下警告消息: “ BiopythonWarning:部分密码子,len(序列)不能为3的倍数。显式修饰序列或在翻译前添加尾随N。将来可能会出错。”
我试图添加尾随N,但它似乎仍然不起作用。我认为我的代码中可能有一个错误,但是我不确定在哪里。
这是我的代码:
from Bio.Seq import Seq
from Bio import SeqIO
seq_records = SeqIO.parse('rna.fasta', 'fasta')
amino_acids1 = []
amino_acids2 = []
amino_acids3 = []
for record in seq_records:
# starting from nucleotide 1
if len(record) %3 ==0:
amino_acids1.append(record.translate())
elif (len(record)+1) %3 ==0:
recordN = record + Seq('N')
amino_acids1.append(recordN.translate())
elif (len(record)+2) %3 ==0:
recordNN = record + Seq('N') + Seq('N')
amino_acids1.append(recordNN.translate())
print("FIRST")
print(amino_acids1)
with open('rna_out.fasta', 'w') as p_file:
SeqIO.write(amino_acids1, p_file, 'fasta')
# starting from nucleotide 2
record2 = record[1:]
if len(record2) %3 ==0:
amino_acids2.append(record2.translate())
elif (len(record2)+1) %3 ==0:
record2N = record + Seq('N')
amino_acids2.append(record2N.translate())
elif (len(record2)+2) %3 ==0:
record2NN = record + Seq('N') + Seq('N')
amino_acids2.append(record2NN.translate() )
print("SECOND")
print(amino_acids2)
with open('rna_out.fasta', 'w') as p_file:
SeqIO.write(amino_acids2, p_file, 'fasta')
# starting from nucleotide 3
record3 = record[2:]
if len(record3) %3 ==0:
amino_acids3.append(record3.translate())
elif (len(record3)+1) %3 ==0:
record3N = record + Seq('N')
amino_acids3.append(record3N.translate())
elif (len(record3)+2) %3 ==0:
record3NN = record + Seq('N') + Seq('N')
amino_acids3.append(record3NN.translate())
print("THIRD")
print(amino_acids3)
with open('rna_out.fasta', 'w') as p_file:
SeqIO.write(amino_acids3, p_file, 'fasta')
通常,这将为fasta文件中的每个序列提供3种可能的翻译。但是,输出似乎不正确。
这些是前3行,应该是fasta文件中第一个序列的3种不同翻译:
第一 [SeqRecord(seq = Seq('GAKRTDRT S VINKLSLLYTSCETIDCYIFFL',HasStopCodon(ExtendedIUPACProtein(),'')),id ='',name ='',description ='',dbxrefs = [])] 第二 [SeqRecord(seq = Seq('GAKRTDRT S VINKLSLLYTSCETIDCYIFFL',HasStopCodon(ExtendedIUPACProtein(),'')),id ='',name ='',description ='', dbxrefs = [])] 第三 [SeqRecord(seq = Seq('CQKN SDVVVGH QTVVALHVMRND LLYLFP',HasStopCodon(ExtendedIUPACProtein(),'')),id =“,name =”,描述='',dbxrefs = [])]
我不知道哪里出了问题,但这绝对不是正确的翻译。如果您知道我犯了一个错误,我将非常感谢您的帮助!
答案 0 :(得分:0)
您的方法可能有效,但是您的代码中存在复制和粘贴错误:
record2 = record[1:]
if len(record2) %3 ==0:
amino_acids2.append(record2.translate())
elif (len(record2)+1) %3 ==0:
record2N = record + Seq('N')
请注意,最后一行的record
应该是record2
。您至少四次犯此错误。我相信@Chris_Rands代码可以指导您深入了解该问题,例如也可以翻译反向补码,但是我不建议在该代码中使用pad_seq()
函数。
下面是集成到您的代码中的pad_seq()
的重做:
from Bio.Seq import Seq
from Bio import SeqIO
def pad_seq(sequence):
""" Pad sequence to multiple of 3 with N """
remainder = len(sequence) % 3
return sequence if remainder == 0 else sequence + Seq('N' * (3 - remainder))
seq_records = SeqIO.parse('rna.fasta', 'fasta')
amino_acids1 = []
amino_acids2 = []
amino_acids3 = []
for record in seq_records:
# starting from nucleotide 1
amino_acids1.append(pad_seq(record).translate())
print("FIRST")
print(amino_acids1)
# ...
# starting from nucleotide 2
record2 = record[1:]
amino_acids2.append(pad_seq(record2).translate())
print("SECOND")
print(amino_acids2)
# ...
# starting from nucleotide 3
record3 = record[2:]
amino_acids3.append(pad_seq(record3).translate())
print("THIRD")
print(amino_acids3)
# ...