计算氨基酸频率的问题

时间:2015-01-28 21:58:06

标签: python

我正在编写一个程序来确定给定序列中每种氨基酸的百分比。我试图通过输出每个氨基酸的百分比并告诉我字典中的哪些氨基酸不存在来使它对任何序列都有用。我在这里遇到了一些困难,我真的很感激一些指导。

因此,我正在尝试进行更详细的说明,以便输出显示所提供字符串中每种氨基酸的百分比,包括那些不在字符串中的氨基酸。

这是我目前的代码:

protein = """MKLFWLLFTIGFCWAQYSSNTQQGRTSIVHLFEWRWVDIALECERYLAPKGFGGVQVSPPNENVAIHNPFRPWWERYQPVSYKLCTRSGNEDEFRNMVTRCNNVGVRIYVDAVINHMCGNAVSAGTSSTCGSYFNPGSRDFPAVPYSGWDFNDGKCKTGSGDIENYNDATQVRDCRLSGLLDLALGKDYVRSKIAEYMNHLIDIGVAGFRIDASKHMWPGDIKAILDKLHNLNSNWFPEGSKPFIYQEVIDLGGEPIKSSDYFGNGRVTEFKYGAKLGTVIRKWNGEKMSYLKNWGEGWGFMPSDRALVFVDNHDNQRGHGAGGASILTFWDARLYKMAVGFMLAHPYGFTRVMSSYRWPRYFENGKDVNDWVGPPNDNGVTKEVTINPDTTCGNDWVCEHRWRQIRNMVNFRNVVDGQPFTNWYDNGSNQVAFGRGNRGFIVFNNDDWTFSLTLQTGLPAGTYCDVISGDKINGNCTGIKIYVSDDGKAHFSISNSAEDPFIAIHAESKL""" #exchange sequence for unique analysis
amino_acid = ['C', 'D', 'S', 'Q', 'K', 'P', 'T', 'F', 'A', 'X', 'G', 'I', 'E', 'L', 'H', 'R', 'W', 'M', 'N', 'Y', 'V']
for a in amino_acid:
    if a in protein:
        print "Percentage of" + amino_acid[a] + "is" + ((protein.count(a)) * 100 / len(protein))
    else:
        print amino_acid[a] + "is not in sequence"

这是我之前所做的工作,但不会显示根本不显示的氨基酸(0%)

from collections import Counter
sequence = "MKLFWLLFTIGFCWAQYSSNTQQGRTSIVHLFEWRWVDIALECERYLAPKGFGGVQVSPPNENVAIHNPFRPWWERYQPVSYKLCTRSGNEDEFRNMVTRCNNVGVRIYVDAVINHMCGNAVSAGTSSTCGSYFNPGSRDFPAVPYSGWDFNDGKCKTGSGDIENYNDATQVRDCRLSGLLDLALGKDYVRSKIAEYMNHLIDIGVAGFRIDASKHMWPGDIKAILDKLHNLNSNWFPEGSKPFIYQEVIDLGGEPIKSSDYFGNGRVTEFKYGAKLGTVIRKWNGEKMSYLKNWGEGWGFMPSDRALVFVDNHDNQRGHGAGGASILTFWDARLYKMAVGFMLAHPYGFTRVMSSYRWPRYFENGKDVNDWVGPPNDNGVTKEVTINPDTTCGNDWVCEHRWRQIRNMVNFRNVVDGQPFTNWYDNGSNQVAFGRGNRGFIVFNNDDWTFSLTLQTGLPAGTYCDVISGDKINGNCTGIKIYVSDDGKAHFSISNSAEDPFIAIHAESKL" #exchange sequence for unique analysis
counts = Counter(sequence)
length = len(sequence)
dictionary = dict(counts)
amino_acids = list(dictionary)
freq = dictionary.values()
percentage = []
for item in freq:
    percentage.append(((item)/float(length))*100)
print "The percentage of each amino acid in the provided sequence are shown below:"
print str(zip(amino_acids, percentage))

2 个答案:

答案 0 :(得分:2)

值得深入了解您尝试解决的问题,但根据您在上面提供的要点,我非常肯定您正在寻找的是一个Counter对象。

具体做法是:

>>> from collections import Counter:
>>> test = Counter("""MKLFWLLFTIGFCWAQYSSNTQQGRTSIVHLFEWRWVDIALECERYLAPKGFGGVQVSPPNENVAIHNPFRPWWERYQPVSYKLCTRSGNEDEFRNMVTRCNNVGVRIYVDAVINHMCGNAVSAGTSSTCGSYFNPGSRDFPAVPYSGWDFNDGKCKTGSGDIENYNDATQVRDCRLSGLLDLALGKDYVRSKIAEYMNHLIDIGVAGFRIDASKHMWPGDIKAILDKLHNLNSNWFPEGSKPFIYQEVIDLGGEPIKSSDYFGNGRVTEFKYGAKLGTVIRKWNGEKMSYLKNWGEGWGFMPSDRALVFVDNHDNQRGHGAGGASILTFWDARLYKMAVGFMLAHPYGFTRVMSSYRWPRYFENGKDVNDWVGPPNDNGVTKEVTINPDTTCGNDWVCEHRWRQIRNMVNFRNVVDGQPFTNWYDNGSNQVAFGRGNRGFIVFNNDDWTFSLTLQTGLPAGTYCDVISGDKINGNCTGIKIYVSDDGKAHFSISNSAEDPFIAIHAESKL""")
>>> test
Counter({'G': 52, 'N': 41, 'D': 35, 'V': 35, 'S': 33, 'F': 29, 'I': 28, 'R': 28, 'A': 27, 'L': 27, 'K': 24, 'T': 23, 'P': 22, 'Y': 21, 'E': 20, 'W': 19, 'C': 12, 'H': 12, 'Q': 12, 'M': 11})

这应该足以让你继续。如果您有任何疑问,请告诉我。尽量不要完全用勺子给出答案。

答案 1 :(得分:0)

protein = """MKLFWLLFTIGFCWAQYSSNTQQGRTSIVHLFEWRWVDIALECERYLAPKGFGGVQVSPPNENVAIHNPFRPWWERYQPVSYKLCTRSGNEDEFRNMVTRCNNVGVRIYVDAVINHMCGNAVSAGTSSTCGSYFNPGSRDFPAVPYSGWDFNDGKCKTGSGDIENYNDATQVRDCRLSGLLDLALGKDYVRSKIAEYMNHLIDIGVAGFRIDASKHMWPGDIKAILDKLHNLNSNWFPEGSKPFIYQEVIDLGGEPIKSSDYFGNGRVTEFKYGAKLGTVIRKWNGEKMSYLKNWGEGWGFMPSDRALVFVDNHDNQRGHGAGGASILTFWDARLYKMAVGFMLAHPYGFTRVMSSYRWPRYFENGKDVNDWVGPPNDNGVTKEVTINPDTTCGNDWVCEHRWRQIRNMVNFRNVVDGQPFTNWYDNGSNQVAFGRGNRGFIVFNNDDWTFSLTLQTGLPAGTYCDVISGDKINGNCTGIKIYVSDDGKAHFSISNSAEDPFIAIHAESKL""" #exchange sequence for unique analysis
amino_acid = ['C', 'D', 'S', 'Q', 'K', 'P', 'T', 'F', 'A', 'X', 'G', 'I', 'E', 'L', 'H', 'R', 'W', 'M', 'N', 'Y', 'V']
counts = {}
for amino in amino_acid: counts[amino] = 0
total = 0
for a in amino_acid:
    if a in protein:
        counts[a] = protein.count(a)
        fraction = float(counts[a]) / float(len(protein))
        percent = fraction * 100
        print "Percentage of " + a + " is:  %.2f%%" % percent
        total += percent
    else:
        print "Percentage of " + a + " is:  0.0%" 
print 'Total: ', str(total) + '%'
print 'Amino Acid Counts: ', counts