Question

我正在研究rosalind.org上的生物信息学问题，我遇到了一个问题，我编写的python脚本适用于较小的数据集但是当应用于较大的数据集时，我得到{{1消息。

基本上我有一个较小的基序和一个较大的DNA序列，我必须在DNA序列中找到基序的实例。当我将问题中的样本数据集放入我的脚本中时，它工作正常，我得到了正确的答案。但是，使用明显更大的基序和序列会产生前面提到的错误。

这是我的代码：

IndexError: list index out of range

motif = "<motif around 9 characters>" cMotif = list(motif) motifLength = len(cMotif) dna = "<DNA sequence around 900 characters>" dnArray = list(dna) locations = "" position = 0 for nt in dnArray: if (nt == cMotif[0]): for x in range(0, (motifLength)): if ((x + position) > len(dnArray)): break if (dnArray[position + x] == cMotif[x]): if (x >= (motifLength - 1)): locations += (str(position + 1) + " ") break else: break position += 1 print(locations)错误发生在第18行IndexError: list index out of range，因此我添加了

if (dnArray[position + x] == cMotif[x]):

但这没有什么区别。

干杯

Answer 1

Python的列表从零开始，因此当试图访问(x + position) == len(dnArray)的{{1}}将超过最后一个索引时。您应该将测试更改为dnArray[x + position]以解决问题。

Answer 2

我建议你使用python的正则表达式来代替轻松。

import re
motif = "abc"
dna = "helloabcheyabckjlkjsabckjetc"

for i in re.finditer(motif,dna):
    print(i.start(), i.end())

它为motif

中dna的每个位置提供字符串中的开始，结束索引

Answer 3

以下是抛出错误的程序：

motif = "abcd"
cMotif = list(motif)
motifLength = len(cMotif)

dna = "I am a dna which has abcd in it.a"
dnArray = list(dna)

locations = ""

position = 0

for nt in dnArray:
        if (nt == cMotif[0]):
                for x in range(0, (motifLength)):
                        if ((x + position) > len(dnArray)):
                                break

                        if (dnArray[position + x] == cMotif[x]):
                                if (x >= (motifLength - 1)):
                                    locations += (str(position + 1) + "      ")
                                    break 
                        else:
                                break
        position += 1

print(locations)

我将if ((x + position) > len(dnArray)):更改为if ((x + position) >= len(dnArray)):并且错误消失了，因为您的程序永远不会转到break语句，因为您没有检查"="条件。请记住，在编程语言中，事物从0开始。

将此行置于条件if ((x + position) > len(dnArray)):之上，您将知道原因：

print("My position is: " + str(x+position) + " and the length is: " + str(len(dnArray)))

此print语句的最后一行将指示My position is: 33 and the length is: 33

在此处看到您已到达该行的末尾，并且它与您现有的条件不符合进入break语句。

在字符串中查找字符串的实例

3 个答案: