我为一个将DNA转录成氨基酸的项目编写了一些Python代码,但它似乎没有正确显示(读作:完全没有[氨基酸部分,至少])。有谁知道我的缺陷在哪里?顺便说一句,我是Python的新手,所以如果代码看起来有点不守规矩,那是因为我缺乏Python常规的经验。谢谢!
#DNA strand - does not need to be transcribed to RNA
dna = raw_input("What is the DNA strand that you need to be transcribed?")
#Start and Stop codons
start = "ATG"
stop1 = "TAA"
stop2 = "TAG"
stop3 = "TGA"
#Number of codons
divide = len(dna)
codon_number = divide / 3
total_codons = codon_number
#Now, to split the DNA!
codon_groups = []
multiplier = 1
if(total_codons > 0):
codon_groups.append(dna[0:3])
while (codon_number > 0):
first = multiplier * 3
second = first + 3
up_next = dna[first:second]
codon_groups.append(up_next)
codon_number = codon_number - 1
multiplier = multiplier + 1
print(codon_groups)
#The fun part is up next!
amino_acids = []
traverse = 0
up = 1
started = 0
stopped = 0
for codon in codon_groups:
if(stopped == 0):
acid = codon_groups[traverse:up]
if (started == 0):
if(acid == start):
started = 1
amino_acids.append("Start Start - ATG")
print(amino_acids)
else:
if(acid == "ATT" or acid == "ATC" or acid == "ATA"):
amino_acids.append("Isoleucine Ile I ATT ATC ATA")
if(acid == "CTT" or acid == "CTC" or acid == "CTA" or acid == "CTG" or acid == "TTA" or acid == "TTG"):
amino_acids.append("Leucine Leu L CTT CTC CTA CTG TTA TTG")
if(acid == "GTT" or acid == "GTC" or acid == "GTA" or acid == "GTG"):
amino_acids.append("Valine Val V GTT GTC GTA GTG")
if(acid == "TTT" or acid == "TTC"):
amino_acids.append("Phenylalanine Phe F TTT TTC")
if(acid == "ATG"):
amino_acids.append("Methionine Met M ATG")
if(acid == "TGT" or acid == "TGC"):
amino_acids.append("Cysteine Cys C TGT TGC")
if(acid == "GCT" or acid == "GCC" or acid == "GCA" or acid == "GCG"):
amino_acids.append("Alanine Ala A GCT GCC GCA GCG")
if(acid == "GGT" or acid == "GGA" or acid == "GGC" or acid == "GGG"):
amino_acids.append("Glycine Gly G GGT GGA GGC GGG")
if(acid == "CCT" or acid == "CCA" or acid == "CCG" or acid == "CCC"):
amino_acids.append("Proline Pro P CCT CCA CCG CCC")
if(acid == "ACT" or acid == "ACG" or acid == "ACC" or acid == "ACA"):
amino_acids.append("Threonine Thr T ACT ACG ACC ACA")
if(acid == "TCT" or acid == "TCC" or acid == "TCA" or acid == "TCG" or acid == "AGT" or acid == "AGC"):
amino_acids.append("Serine Ser S TCT TCC TCA TCG AGT AGC")
if(acid == "TAT" or acid == "TAC"):
amino_acids.append("Tyrosine Tyr Y TAT TAC")
if(acid == "TGG"):
amino_acids.append("Tryptophan Trp W TGG")
if(acid == "CAA" or acid == "CAG"):
amino_acids.append("Glutamine Glu Q CAA CAG")
if(acid == "AAT" or acid == "AAC"):
amino_acids.append("Asparagine Asn N AAT AAC")
if(acid == "CAT" or acid == "CAC"):
amino_acids.append("Histidine His H CAT CAC")
if(acid == "GAA" or acid == "GAG"):
amino_acids.append("GlutamicAcid Gln G GAA GAG")
if(acid == "GAT" or acid == "GAC"):
amino_acids.append("AsparticAcid Asn D GAT GAC")
if(acid == "AAA" or acid == "AAG"):
amino_acids.append("Lysine Lys K AAA AAG")
if(acid == "CGT" or acid == "CGC" or acid == "CGA" or acid == "CGG" or acid == "AGA" or acid == "AGG"):
amino_acids.append("Arginine Arg R CGT CGC CGA CGG AGA AGG")
if(acid == stop1 or acid == stop2 or acid == stop3):
amino_acids.append("Stop Stop + TAA TAG TGA")
stopped = 1
traverse = traverse + 1
up = up + 1
#Now it's display time
go = 0
gadget = 1
for amino in amino_acids:
print(amino_acids[go:gadget])
go = go + 1
gadget = gadget + 1
答案 0 :(得分:2)
也许MattDMO的答案会更方便。但是,我认为我已经修复了现有代码的最小修改。我相信你误解了for codon in codon_groups
的含义;它只是遍历你的分解字符串中的密码子列表,将一个项目分配给变量codon
并提供给你直接在每个循环上使用,然后移动到列表中的下一个密码子
所以:
for codon in codon_groups:
print "codon", codon, "is made up of the following:"
for nuc in codon:
print(nuc)
print "nucleotides"
给出输出:
codon ATG is made up of the following:
A
T
G
nucleotides
codon TTT is made up of the following:
T
T
T
nucleotides
codon TAA is made up of the following:
T
A
A
nucleotides
只需更改这一部分即可运行:
for codon in codon_groups:
print codon
if(stopped == 0):
acid = codon # CHANGED HERE
if (started == 0):
if(acid == start):
started = 1
amino_acids.append("Start Start - ATG")
print(amino_acids)
没有必要实际制作acid = codon
,如果输出符合您的预期,那么只需删除使用' acid'或重命名为for acid in codon_group
。
答案 1 :(得分:1)
使用Biopython会更好 。您可以阅读tutorial开始使用。这是将DNA翻译成蛋白质序列的一个例子:
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
dna_seq = Seq("ATGCTTCGGTCTGGGCCAGCCTCTGGGCCGTCCGTCCCCACTGGCCGGGCCATGCCGAGTCGCCGCGTCTAA", IUPAC.unambiguous_dna)
protein_seq = dna_seq.translate()
print(protein_seq)
给出
Seq('MLRSGPASGPSVPTGRAMPSRRV*', HasStopCodon(IUPACProtein(), '*'))
和
print(str(protein_seq))
给出了简单的序列:
'MLRSGPASGPSVPTGRAMPSRRV*'