我试图找到DNA序列茎环中最长的茎。这是我到目前为止的代码。有人可以帮忙吗?我是python的新手,并尝试练习一些练习。
basepairs = {'A':'T', 'C':'G', 'G':'C', 'T':'A'}
listDNA = 'ATGGGCAT'
listREV = listDNA[::-1]
stem = ''
for i in range(len(listDNA)):
for j in range(len(listDNA)):
if listDNA[i] == basepairs[listREV[j]]:
stem += listDNA[j]
else:
break
print stem
答案 0 :(得分:0)
我带了example from wikipedia。还考虑到应变不能重叠自身。或者可以吗?
basepairs = {'A':'T', 'C':'G', 'G':'C', 'T':'A'}
listDNA = 'GACACGGTGCAACTTAGCACCGTGCA'
listREV = listDNA[::-1]
longest_stem = ''
longest_match = ''
for i in range(len(listDNA)):
#print('i = {}'.format(i))
# forward strain order
for strain_start in range(len(listDNA)):
#print(' start = {}'.format(strain_start))
stem = ''
match = ''
# stem cannot overlap with itself
for n in range( (len(listDNA) - i - strain_start) / 2):
if listDNA[i+n] == basepairs[listREV[i+n]]:
#print(' added = {}'.format(listDNA[i+n]))
stem += listDNA[i+n]
match += listREV[i+n]
else:
break
if len(stem) > len(longest_stem):
longest_stem = stem
longest_match = match
print longest_stem
print longest_match
输出:
CACGGTGC
GTGCCACG
答案 1 :(得分:0)
这是一个有点紧凑的版本,它开始利用观察,一旦你达到一个长度的属性,没有该长度的子链在其他地方有反向补充,那么看看更大的长度是没有意义的:
basepairs = {'A':'T', 'C':'G', 'G':'C', 'T':'A'}
def reverseComplement(s):
return ''.join(basepairs[b] for b in s[::-1])
def longestStem(s):
n = len(s)
k = int(n/2) #length of longest possible stem
candidate = ''
i = 1
while i <= k and len(candidate) == i - 1:
for j in xrange(n-2*i+1):
t = s[j:i+j]
if reverseComplement(t) in s[i+j:]:
candidate = t
break
i +=1
return candidate
它适用于您的样本,并且还可以合理地快速处理这样的事情:
import random
s = ''.join(random.choice('ATGC') for i in range(10**4))
print longestStem(s)
但是一旦股线变得更长,你就需要转向更复杂的东西。也许是一种基于字典的方法来摆脱隐藏在O(n)
运算符中的in
因子。也许是深度优先搜索,以深入了解由茎隐含形成的树。