如何找到DNA中茎环的最长茎

时间:2017-03-25 15:29:16

标签: python python-2.7 python-3.x

我试图找到DNA序列茎环中最长的茎。这是我到目前为止的代码。有人可以帮忙吗?我是python的新手,并尝试练习一些练习。

basepairs = {'A':'T', 'C':'G', 'G':'C', 'T':'A'}

listDNA = 'ATGGGCAT'
listREV = listDNA[::-1]

stem = ''
for i in range(len(listDNA)):
    for j in range(len(listDNA)):
       if listDNA[i] == basepairs[listREV[j]]:
           stem += listDNA[j]
       else:
           break

print stem 

2 个答案:

答案 0 :(得分:0)

我带了example from wikipedia。还考虑到应变不能重叠自身。或者可以吗?

basepairs = {'A':'T', 'C':'G', 'G':'C', 'T':'A'}

listDNA = 'GACACGGTGCAACTTAGCACCGTGCA'
listREV = listDNA[::-1]

longest_stem = ''
longest_match = ''
for i in range(len(listDNA)):
    #print('i = {}'.format(i))
    # forward strain order
    for strain_start in range(len(listDNA)):
       #print(' start = {}'.format(strain_start))
       stem = ''
       match = ''
       # stem cannot overlap with itself
       for n in range( (len(listDNA) - i - strain_start) / 2):
           if listDNA[i+n] == basepairs[listREV[i+n]]:
               #print('  added = {}'.format(listDNA[i+n]))
               stem += listDNA[i+n]
               match += listREV[i+n]
           else:
               break
       if len(stem) > len(longest_stem):
            longest_stem = stem
            longest_match = match

print longest_stem
print longest_match

输出:

CACGGTGC
GTGCCACG

答案 1 :(得分:0)

这是一个有点紧凑的版本,它开始利用观察,一旦你达到一个长度的属性,没有该长度的子链在其他地方有反向补充,那么看看更大的长度是没有意义的:

basepairs = {'A':'T', 'C':'G', 'G':'C', 'T':'A'}

def reverseComplement(s):
    return ''.join(basepairs[b] for b in s[::-1])

def longestStem(s):
    n = len(s)
    k = int(n/2) #length of longest possible stem
    candidate = ''
    i = 1

    while i <= k and len(candidate) == i - 1:
        for j in xrange(n-2*i+1):
            t = s[j:i+j]
            if reverseComplement(t) in s[i+j:]:
                candidate = t
                break
        i +=1
    return candidate

它适用于您的样本,并且还可以合理地快速处理这样的事情:

import random
s = ''.join(random.choice('ATGC') for i in range(10**4))
print longestStem(s)

但是一旦股线变得更长,你就需要转向更复杂的东西。也许是一种基于字典的方法来摆脱隐藏在O(n)运算符中的in因子。也许是深度优先搜索,以深入了解由茎隐含形成的树。