我正在使用Python进行分配,如果你能回答,我有一个问题。 我想编写一个函数,该函数返回一个列表,其中包含序列中所有出现的“ATG”的第一个核苷酸的位置。 例如,我们可以说我们的DNA序列是A A TGC A TGC。我们看到ATG可以从索引1开始,另一种可能性是索引5。 我尝试过这个来解决这个问题;
dna = "AATGCATGC"
starting_offset = dna.index("ATG")
print(starting_offset)
我得到的结果是1.但我想得到[1,5]
的结果那么我应该如何为所有事件编写这个函数?
感谢您帮助我:)
答案 0 :(得分:2)
使用正则表达式,您可以使用re.finditer查找所有出现的内容:
您可以尝试此功能:
import re
text = 'AATGCATGC'
pattern='ATG'
def getIndexes (text,pattern):
list=[index.start() for index in re.finditer('ATG', text)]
return list
getIndexes(text,pattern)
>>[1, 5]
它会为您提供您正在寻找的列表。希望能有所帮助!
答案 1 :(得分:0)
如果您想要考虑一下,请分析一下:
def GetMultipleInString(dna, term):
# computing end condition 0
if (term not in dna):
print (dna + " does not contain the term " + term)
return []
# start of list of lists of 2 elements: index, rest
result = [[None,dna]]
# we look for the index in the rest, need to keep track how much we
# shortened the string in total so far to get index in complete string
totalIdx = 0
# we look at the last element of the list until it's length is shorter
# than the term we look for (end of computing condition 1)
termLen = len(term)
while len(result[-1][1]) >= termLen:
# get the last element
last = result[-1][1]
try:
# find our term, if not found -> exception
idx = last.index(term)
# partition "abcdefg" with "c" -> ("ab","c", "defg")
# we take only the remaining
rest = last.partition(term)[2]
# we compute the total index, and put it in our result
result.append( [idx+totalIdx , rest] )
totalIdx += idx+termLen
except:
result.append([None,last])
break
# any results found that are not none?
if (any( x[0] != None for x in result)):
print (dna + " contains the term " + term + " at positions:"),
# get only indexes from our results
rv = [ str(x[0]) for x in result if x[0] != None]
print (' '.join(rv))
return rv
else:
print (dna + " does not contain the term " + term)
return []
print("_----------------------------------_")
myDna = "AATGCATGC"
res1 = GetMultipleInString(myDna,"ATG")
print(res1)
res2 = GetMultipleInString(myDna,"A")
print(res2)