用于DNA转录的Python显示

时间:2016-05-18 16:29:00

标签: python python-2.7 bioinformatics

我为一个将DNA转录成氨基酸的项目编写了一些Python代码,但它似乎没有正确显示(读作:完全没有[氨基酸部分,至少])。有谁知道我的缺陷在哪里?顺便说一句,我是Python的新手,所以如果代码看起来有点不守规矩,那是因为我缺乏Python常规的经验。谢谢!

#DNA strand - does not need to be transcribed to RNA
dna = raw_input("What is the DNA strand that you need to be transcribed?")
#Start and Stop codons
start = "ATG"
stop1 = "TAA"
stop2 = "TAG"
stop3 = "TGA"
#Number of codons
divide = len(dna)
codon_number = divide / 3
total_codons = codon_number
#Now, to split the DNA!
codon_groups = []
multiplier = 1
if(total_codons > 0):
    codon_groups.append(dna[0:3])
    while (codon_number > 0):
        first = multiplier * 3
        second = first + 3
        up_next = dna[first:second]
        codon_groups.append(up_next)
        codon_number = codon_number - 1
        multiplier = multiplier + 1
print(codon_groups)
#The fun part is up next!
amino_acids = []
traverse = 0
up = 1
started = 0
stopped = 0
for codon in codon_groups:
    if(stopped == 0):
        acid = codon_groups[traverse:up]
        if (started == 0):
            if(acid == start):
                started = 1
                amino_acids.append("Start        Start - ATG")
                print(amino_acids)
        else:
            if(acid == "ATT" or acid == "ATC" or acid == "ATA"):
                amino_acids.append("Isoleucine  Ile I ATT ATC ATA")
            if(acid == "CTT" or acid == "CTC" or acid == "CTA" or acid ==    "CTG" or acid == "TTA" or acid == "TTG"):
                amino_acids.append("Leucine     Leu L CTT CTC CTA CTG TTA TTG")
            if(acid == "GTT" or acid == "GTC" or acid == "GTA" or acid == "GTG"):
                amino_acids.append("Valine      Val V GTT GTC GTA GTG")
            if(acid == "TTT" or acid == "TTC"):
                amino_acids.append("Phenylalanine Phe F TTT TTC")
            if(acid == "ATG"):
                amino_acids.append("Methionine  Met M ATG")
            if(acid == "TGT" or acid == "TGC"):
                amino_acids.append("Cysteine    Cys C TGT TGC")
            if(acid == "GCT" or acid == "GCC" or acid == "GCA" or acid == "GCG"):
                amino_acids.append("Alanine     Ala A GCT GCC GCA GCG")
            if(acid == "GGT" or acid == "GGA" or acid == "GGC" or acid == "GGG"):
                amino_acids.append("Glycine     Gly G GGT GGA GGC GGG")
            if(acid == "CCT" or acid == "CCA" or acid == "CCG" or acid == "CCC"):
                amino_acids.append("Proline     Pro P CCT CCA CCG CCC")
            if(acid == "ACT" or acid == "ACG" or acid == "ACC" or acid == "ACA"):
                amino_acids.append("Threonine   Thr T ACT ACG ACC ACA")
            if(acid == "TCT" or acid == "TCC" or acid == "TCA" or acid == "TCG" or acid == "AGT" or acid == "AGC"):
                amino_acids.append("Serine      Ser S TCT TCC TCA TCG AGT AGC")
            if(acid == "TAT" or acid == "TAC"):
                amino_acids.append("Tyrosine     Tyr Y TAT TAC")
            if(acid == "TGG"):
                amino_acids.append("Tryptophan   Trp W TGG")
            if(acid == "CAA" or acid == "CAG"):
                amino_acids.append("Glutamine    Glu Q CAA CAG")
            if(acid == "AAT" or acid == "AAC"):
                amino_acids.append("Asparagine   Asn N AAT AAC")
            if(acid == "CAT" or acid == "CAC"):
                amino_acids.append("Histidine   His H CAT CAC")
            if(acid == "GAA" or acid == "GAG"):
                amino_acids.append("GlutamicAcid Gln G GAA GAG")
            if(acid == "GAT" or acid == "GAC"):
                amino_acids.append("AsparticAcid Asn D GAT GAC")
            if(acid == "AAA" or acid == "AAG"):
                amino_acids.append("Lysine       Lys K AAA AAG")
            if(acid == "CGT" or acid == "CGC" or acid == "CGA" or acid == "CGG" or acid == "AGA" or acid == "AGG"):
                amino_acids.append("Arginine     Arg R CGT CGC CGA CGG AGA AGG")
            if(acid == stop1 or acid == stop2 or acid == stop3):
                amino_acids.append("Stop         Stop + TAA TAG TGA")
                stopped = 1
        traverse = traverse + 1
        up = up + 1

#Now it's display time
go = 0
gadget = 1
for amino in amino_acids:
    print(amino_acids[go:gadget])
    go = go + 1
    gadget = gadget + 1

2 个答案:

答案 0 :(得分:2)

也许MattDMO的答案会更方便。但是,我认为我已经修复了现有代码的最小修改。我相信你误解了for codon in codon_groups的含义;它只是遍历你的分解字符串中的密码子列表,将一个项目分配给变量codon并提供给你直接在每个循环上使用,然后移动到列表中的下一个密码子

所以:

for codon in codon_groups:
     print "codon", codon, "is made up of the following:"
     for nuc in codon:
         print(nuc)
     print "nucleotides"

给出输出:

codon ATG is made up of the following:
A
T
G
nucleotides
codon TTT is made up of the following:
T
T
T
nucleotides
codon TAA is made up of the following:
T
A
A
nucleotides

只需更改这一部分即可运行:

for codon in codon_groups:
    print codon
    if(stopped == 0):
        acid = codon  # CHANGED HERE
        if (started == 0):
            if(acid == start):
                started = 1
                amino_acids.append("Start        Start - ATG")
                print(amino_acids)

没有必要实际制作acid = codon,如果输出符合您的预期,那么只需删除使用' acid'或重命名为for acid in codon_group

答案 1 :(得分:1)

使用Biopython会更好 。您可以阅读tutorial开始使用。这是将DNA翻译成蛋白质序列的一个例子:

from Bio.Seq import Seq
from Bio.Alphabet import IUPAC

dna_seq = Seq("ATGCTTCGGTCTGGGCCAGCCTCTGGGCCGTCCGTCCCCACTGGCCGGGCCATGCCGAGTCGCCGCGTCTAA", IUPAC.unambiguous_dna)
protein_seq = dna_seq.translate()
print(protein_seq)

给出

Seq('MLRSGPASGPSVPTGRAMPSRRV*', HasStopCodon(IUPACProtein(), '*'))

print(str(protein_seq))

给出了简单的序列:

'MLRSGPASGPSVPTGRAMPSRRV*'