Python查找功能不起作用。我究竟做错了什么?

时间:2015-11-26 13:29:19

标签: python string substring bioinformatics rosalind

我是一个业余爱好者程序员(我的实际主要是生物学),所以如果代码是残酷的,我会道歉。

无论如何,我正在做一个rosalind.info练习(http://rosalind.info/problems/subs/),希望我找到每个索引,其中特定的DNA基序包含在更大的DNA序列中。基本上,我需要在字符串中找到子字符串的索引。应该很容易吧?好吧,也许你可以帮助我。

所以这是我的代码:

with open('rosalind_subs.txt') as f:
    seq = f.readline()
    seq.strip()
    subs = f.readline()
    subs.strip()
    break

def finder(x, y):
    index = x.find(y)
    return index

print("sequence is: " + seq)
print("subs is: " + subs)

print(finder(seq, subs))

这是我的输出:

sequence is:  ACCAGTCTCTTTTTTCTCTTTTCTCTTTTCTCTTTTGACCCTCTTTTCGTCACTCTTTTACCTCTTTTTCTCTTTTACTCTTTTCTCTTTTACTCTTTTACTCTTTTAGCGCAGATCTCTTTTCTCTTTTGGCTCTTTTGTCATCCTCTTTTAGACTCTTTTGGGAAGCGACGCCTCTTTTCTCTTTTCTCTTTTGCCTCTTTTTATAACCTAAAAGACTCTTTTCCCTCTTTTCCGATTTGCCAAGGGCTCTCTTTTCTCTTTTGCTCTTTTCTCTTTTCTCTTTTTACTCTTTTCTCTTTTCGCCCCAAGATTAACTCTTTTTCTCTTTTCTCTCTTTTTTCCTCTTTTCTCTTTTGAATTGACCTCTTTTTCTCTTTTTTTGGGCCGCTCTTTTCTCTTTTACTCTTTTCTCTCTTTTAACAGCTCTTTTCCTTCTCTTTTGTCTCTTTTAGTATACTCTTTTACTCTTTTCTCTTTTCTCTCTTTTACTCTTTTGCTCTTTTCTCTTTTTGTCTCTTTTGCCCTGTCTCTTTTCACGCTTCTCTTTTAGTGTACTCTTTTACTCTTTTTGGCTCTTTTCGAATTTGTTAGCTCTTTTGCTCTTTTCTCTTTTGCTCTTTTGTCTCTTTTGATCAGATTCTCTTTTTCTCTTTTCTCTTTTCCTTAAGCAGATTTCTCTTTTCTCTTTTTCTCTCTTTTGCTCTTTTACTCTTTTACTGCTTTCTCTTTTACAACCTCTTTTACTCTTTTAAGCTCTTTTCTCTTTTGCGCCTCTTTTCCTCCCCTCTTTTTAGCTCTTTTCTCTTTTTCGCTCTTTTCAGCTCTTTTCACTCTTTTGTTTTGAGCTCTTTTCAGACTCTTTTATCCTCTTTTTTCCTCTTTTAGCGCTCTTTTGTAGCCTCTTTT

motif is: CTCTTTTCT

-1

***Repl Closed***

我将***Repl Closed***留在那里,努力不遗余力。也许它与Sublime REPL有关?

无论如何,你可能只能通过观察来判断,但这个主题实际上在DNA序列中发现了很多次,它只是找不到它的发现功能。是什么给了什么?

2 个答案:

答案 0 :(得分:1)

break不适用于范围。请删除并尝试。我测试了以下代码。

with open('rosalind_subs.txt') as f:
    seq = f.readline()
    seq.strip()
    subs = f.readline()
    subs.strip()

def finder(x, y):
    index = x.find(y)
    return index

print("sequence is: " + seq)
print("subs is: " + subs)

print(finder(seq, subs))

输出

>>> 
sequence is: ACCAGTCTCTTTTTTCTCTTTTCTCTTTTCTCTTTTGACCCTCTTTTCGTCACTCTTTTACCTCTTTTTCTCTTTTACTCTTTTCTCTTTTACTCTTTTACTCTTTTAGCGCAGATCTCTTTTCTCTTTTGGCTCTTTTGTCATCCTCTTTTAGACTCTTTTGGGAAGCGACGCCTCTTTTCTCTTTTCTCTTTTGCCTCTTTTTATAACCTAAAAGACTCTTTTCCCTCTTTTCCGATTTGCCAAGGGCTCTCTTTTCTCTTTTGCTCTTTTCTCTTTTCTCTTTTTACTCTTTTCTCTTTTCGCCCCAAGATTAACTCTTTTTCTCTTTTCTCTCTTTTTTCCTCTTTTCTCTTTTGAATTGACCTCTTTTTCTCTTTTTTTGGGCCGCTCTTTTCTCTTTTACTCTTTTCTCTCTTTTAACAGCTCTTTTCCTTCTCTTTTGTCTCTTTTAGTATACTCTTTTACTCTTTTCTCTTTTCTCTCTTTTACTCTTTTGCTCTTTTCTCTTTTTGTCTCTTTTGCCCTGTCTCTTTTCACGCTTCTCTTTTAGTGTACTCTTTTACTCTTTTTGGCTCTTTTCGAATTTGTTAGCTCTTTTGCTCTTTTCTCTTTTGCTCTTTTGTCTCTTTTGATCAGATTCTCTTTTTCTCTTTTCTCTTTTCCTTAAGCAGATTTCTCTTTTCTCTTTTTCTCTCTTTTGCTCTTTTACTCTTTTACTGCTTTCTCTTTTACAACCTCTTTTACTCTTTTAAGCTCTTTTCTCTTTTGCGCCTCTTTTCCTCCCCTCTTTTTAGCTCTTTTCTCTTTTTCGCTCTTTTCAGCTCTTTTCACTCTTTTGTTTTGAGCTCTTTTCAGACTCTTTTATCCTCTTTTTTCCTCTTTTAGCGCTCTTTTGTAGCCTCTTTT

subs is: CTCTTTTCT
15

答案 1 :(得分:0)

同时也是这里的生物学家who has done several rosalind.info exercises

首先,您可以使用splitlines()来改进您在序列和主题中读取的代码,它负责删除换行符。另请注意我如何使用tuple unpacking一次分配seqmotif变量。

with open('rosalind_subs.txt') as f:
    seq, motif = f.read().splitlines()

接下来,您正确地注意到find仅返回主题第一次出现的索引。要查找所有事件,有助于知道find需要另一个可选参数start。如果你提供,它会从该索引位置开始查看。在循环中使用它可以获得所有索引。

另一种方法是使用regular expressions。请注意,图案可以相互重叠,因此您需要使用lookahead assertion