计算特定ID的gc内容

时间:2014-02-12 10:16:20

标签: python python-2.7

我有一个名为self的字典.__序列读起来像“ID:DNA sequence”,以下是该字典的一部分

{
 '1111758': ('TTAGAGTTTGATCCTGGCTCAGAACGAACGCTGGCGGCAGGCCTAA\n', ''), 
 '1111762': ('AGAGTTTGATCCTGGCTCAGATTGA\n', ''),
 '1111763': ('AGAGTTTGATCCTGGCCTT\n', '')
}

我想计算特定序列ID(some_id)的gc conent。也就是说,如果some_id在字典中,则返回该ID的DNA序列的gc内容;如果some_id不存在,则返回错误消息

P.S。 gc含量=(G + C)/(A + T + G + C)DNA序列

我编写以下代码(该函数在类下)但它给了我错误消息。如果有人能帮助我改进我的代码,我感激不尽

def compute_gc_content(self, some_id=''):
    """compute the gc conent for sequence ID (some_id). If some_id is in the  
dictionary, return the gc content of the DNA sequence for that ID; if some_id 
does not exist,return an error message"""

self.some_id = some_id
    for i in range(len(self.__sequences)):
    if self.some_id in self.__sequences.keys():
        return (self.some_id.values['G']+self.some_id.values['C'])/float(len(self.__sequences))
    else:
        return "This ID does not exist"

所以如果我打印compute_gc_content('1111758'),我想打印gc内容的值,比如0.23。

2 个答案:

答案 0 :(得分:0)

我不确定我是否理解正确。

def compute_gc_content(self, some_id=''):

    if some_id in self.__sequences:
        seq = self.__sequences['some_id'][0]
        return (seq.count('G')+seq.count('C'))/float(len(seq))
    else:
        return "This ID does not exist"

无需使用in self.__sequences.keys()in self.__sequences做同样的事情。

答案 1 :(得分:0)

这就是你要找的东西:

import itertools

class gc:
    def __init__(self):
        self.__sequences = {'1111758': ('TTAGAGTTTGATCCTGGCTCAGAACGAACGCTGGCGGCAGGCCTAA\n', ''), '1111762': ('AGAGTTTGATCCTGGCTCAGATTGA\n', ''), '1111763': ('AGAGTTTGATCCTGGCCTT\n', '')}


def compute_gc_content(self, some_id=''):
    """compute the gc conent for sequence ID (some_id). If some_id is in the  
    dictionary, return the gc content of the DNA sequence for that ID; if some_id 
    does not exist,return an error message"""

    self.some_id = some_id
    for i in range(len(self.__sequences)):
        if self.some_id in self.__sequences.keys():
            return (float)(self.__sequences[some_id][0].count('G')+self.__sequences[some_id][0].count('C'))/(len(self.__sequences[some_id][0]))
        else:
            return "This ID does not exist"

print gc().compute_gc_content('1111758')