Python - 计算核苷酸

时间:2014-04-06 22:29:59

标签: python bioinformatics

所以我应该设计一个计算DNA序列的程序,并计算各个碱基对。这是我到目前为止所拥有的:

class dnaString (str): 
    def __new__(self,s): 
        return str.__new__(self,s.upper()) 
    def length (self): 
        return (len(self)) 
    def getATCG (self,num_A,num_T,num_C,num_G): 
        num_A = self.count("A") 
        num_T = self.count("T") 
        num_C = self.count ("C") 
        num_G = self.count ("G") 
        return ( (self.length(), num_A, num_T, num_G, num_C) ) 

    def printnum_A (self): 
        print ("Adenine base content: {0}".format(self.count("A"))) 

dna = input("Enter a dna sequence: ") 
x=dnaString(dna) 

该程序并没有真正做任何事情,因为我刚刚开始使用python,我不知道如何解决这个问题,所以它有效。我还应该添加什么?我知道它还未完成。

4 个答案:

答案 0 :(得分:1)

我不确定问题是什么,但是因为你没有调用方法'printnum_A`,所以什么都没有打印。如果你这样称呼它,它可以工作:

dna = input("Enter a dna sequence: ") 
x=dnaString(dna) 
x.printnum_A()

根据评论更新

声明类的方法是不够的,您还需要在需要时调用它们。就像printnum_T

一样
class dnaString (str): 
    def __new__(self,s): 
        return str.__new__(self,s.upper()) 
    def length (self): 
        return (len(self)) 
    def getATCG (self,num_A,num_T,num_C,num_G): 
        num_A = self.count("A") 
        num_T = self.count("T") 
        num_C = self.count ("C") 
        num_G = self.count ("G") 
        return ( (self.length(), num_A, num_T, num_G, num_C) ) 

    def printnum_A (self): 
        print ("Adenine base content: {0}".format(self.count("A"))) 

    # here the method is declared
    def printnum_T (self): 
        print ("Adenine base content: {0}".format(self.count("T")))

dna = input("Enter a dna sequence: ") 
x=dnaString(dna) 
x.printnum_A()
# Here I call my method on `x`
x.printnum_T()

答案 1 :(得分:0)

这有帮助吗?它适用于我碰巧安装的Python 2.7.3和3.2.3。

import itertools
import sys

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = itertools.tee(iterable)
    next(b, None)
    if sys.version_info[0] > 2:
        return zip(a,b)
    return itertools.izip(a, b)

class DnaSequence():
    Names = {
        'A' : 'adenine',
        'C' : 'cytosine',
        'G' : 'guanine',
        'T' : 'thymine'
    }
    Bases = Names.keys()


    def __init__(self, seq):
        self._string = seq
        self.bases = { x:0 for x in DnaSequence.Bases }
        self.pairs = { x+y:0 for x in DnaSequence.Bases 
                             for y in DnaSequence.Bases }

        for base in seq:
            if base in self.bases:
                self.bases[base] += 1

        for x,y in pairwise(seq):
            pair = x+y
            if pair in self.pairs:
                self.pairs[pair] += 1


    def printCount(self, base):
        if base in DnaSequence.Names:
            print(DnaSequence.Names[base].capitalize() + 
                  " base content: " + str(self.bases[base]))
        else:
            sys.stderr.write('No such base ("%s")\n' % base)


    def __repr__(self):
        return self._string


d = DnaSequence("CCTAGTGTTAGCTAGTCTAGGGAT")
for base in DnaSequence.Bases:
    d.printCount(base)

# Further:
print(d)
print(d.bases)
print(d.pairs)

它是计算基数(A,C,G,T)和相邻对的所有出现的完整示例(例如在ACCGTA中,对AC,CC,CG,GT,TA都将为1,笛卡尔积ACGT x ACGT的其他11种可能组合都是0)。

此处使用的计数方法在构造函数中扫描字符串一次,而不是每次调用getATGC()时扫描它四次。

答案 2 :(得分:0)

我认为课程可以简化一下:

class DnaString(str): 
    def __new__(self, s): 
        return str.__new__(self, s.strip().upper())

    def __init__(self, _):
        self.num_A = self.count("A") 
        self.num_C = self.count("C") 
        self.num_G = self.count("G") 
        self.num_T = self.count("T") 

    def stats(self):
        return len(self), self.num_A, self.num_C, self.num_G, self.num_T

然后

dna = raw_input("Enter a dna sequence: ") 
d = DnaString(dna)

print(d)
print(d.stats())

给出

Enter a dna sequence: ACGTACGTA
ACGTACGTA
(9, 3, 2, 2, 2)

答案 3 :(得分:-1)

您可以使用字典来组织和检索您的计数。例如:

    DNASeq = raw_input("Enter a DNA sequence: ")

    SeqLength = len(DNASeq)

    print 'Sequence Length:', SeqLength

    BaseKey = list(set(DNASeq)) #creates a list from the unique characters in the DNASeq

    Dict = {}

    for char in BaseKey:
        Dict[char] = DNASeq.count(char)
    print Dict