Question

我正在使用Python进行分配，如果你能回答，我有一个问题。我想编写一个函数，该函数返回一个列表，其中包含序列中所有出现的“ATG”的第一个核苷酸的位置。例如，我们可以说我们的DNA序列是A A TGC A TGC。我们看到ATG可以从索引1开始，另一种可能性是索引5。我尝试过这个来解决这个问题;

dna = "AATGCATGC"
starting_offset = dna.index("ATG")
print(starting_offset)

我得到的结果是1.但我想得到[1,5]

的结果

那么我应该如何为所有事件编写这个函数？

感谢您帮助我：）

Answer 1

使用正则表达式，您可以使用re.finditer查找所有出现的内容：

您可以尝试此功能：

import re
text = 'AATGCATGC'
pattern='ATG'
def getIndexes (text,pattern):
    list=[index.start() for index in re.finditer('ATG', text)]
    return list
getIndexes(text,pattern)
>>[1, 5]

它会为您提供您正在寻找的列表。希望能有所帮助！

Answer 2

如果您想要考虑一下，请分析一下：

def GetMultipleInString(dna, term):
    # computing end condition 0
    if (term not in dna):
        print (dna + " does not contain the term " + term)
        return []

    # start of list of lists of 2 elements: index, rest
    result = [[None,dna]]

    # we look for the index in the rest, need to keep track how much we
    # shortened the string in total so far to get index in complete string
    totalIdx = 0

    # we look at the last element of the list until it's length is shorter
    # than the term we look for (end of computing condition 1)
    termLen = len(term)

    while len(result[-1][1]) >= termLen:
        # get the last element
        last = result[-1][1]
        try:
            # find our term, if not found -> exception
            idx = last.index(term) 
            # partition "abcdefg" with "c" -> ("ab","c", "defg")
            # we take only the remaining 
            rest = last.partition(term)[2] 
            # we compute the total index, and put it in our result
            result.append( [idx+totalIdx , rest] ) 
            totalIdx += idx+termLen 
        except:
            result.append([None,last])
            break

    # any results found that are not none? 
    if (any( x[0] != None for x in result)):

        print (dna + " contains the term " + term + " at positions:"),
        # get only indexes from our results
        rv = [ str(x[0]) for x in result if x[0] != None]
        print (' '.join(rv))

        return rv

    else:
        print (dna + " does not contain the term " + term)
        return []

print("_----------------------------------_")
myDna = "AATGCATGC"  
res1 = GetMultipleInString(myDna,"ATG")   
print(res1)

res2 = GetMultipleInString(myDna,"A")
print(res2)

如何在列表中找到出现的内容？

2 个答案: