我是一个业余爱好者程序员(我的实际主要是生物学),所以如果代码是残酷的,我会道歉。
无论如何,我正在做一个rosalind.info练习(http://rosalind.info/problems/subs/),希望我找到每个索引,其中特定的DNA基序包含在更大的DNA序列中。基本上,我需要在字符串中找到子字符串的索引。应该很容易吧?好吧,也许你可以帮助我。
所以这是我的代码:
with open('rosalind_subs.txt') as f:
seq = f.readline()
seq.strip()
subs = f.readline()
subs.strip()
break
def finder(x, y):
index = x.find(y)
return index
print("sequence is: " + seq)
print("subs is: " + subs)
print(finder(seq, subs))
这是我的输出:
sequence is: ACCAGTCTCTTTTTTCTCTTTTCTCTTTTCTCTTTTGACCCTCTTTTCGTCACTCTTTTACCTCTTTTTCTCTTTTACTCTTTTCTCTTTTACTCTTTTACTCTTTTAGCGCAGATCTCTTTTCTCTTTTGGCTCTTTTGTCATCCTCTTTTAGACTCTTTTGGGAAGCGACGCCTCTTTTCTCTTTTCTCTTTTGCCTCTTTTTATAACCTAAAAGACTCTTTTCCCTCTTTTCCGATTTGCCAAGGGCTCTCTTTTCTCTTTTGCTCTTTTCTCTTTTCTCTTTTTACTCTTTTCTCTTTTCGCCCCAAGATTAACTCTTTTTCTCTTTTCTCTCTTTTTTCCTCTTTTCTCTTTTGAATTGACCTCTTTTTCTCTTTTTTTGGGCCGCTCTTTTCTCTTTTACTCTTTTCTCTCTTTTAACAGCTCTTTTCCTTCTCTTTTGTCTCTTTTAGTATACTCTTTTACTCTTTTCTCTTTTCTCTCTTTTACTCTTTTGCTCTTTTCTCTTTTTGTCTCTTTTGCCCTGTCTCTTTTCACGCTTCTCTTTTAGTGTACTCTTTTACTCTTTTTGGCTCTTTTCGAATTTGTTAGCTCTTTTGCTCTTTTCTCTTTTGCTCTTTTGTCTCTTTTGATCAGATTCTCTTTTTCTCTTTTCTCTTTTCCTTAAGCAGATTTCTCTTTTCTCTTTTTCTCTCTTTTGCTCTTTTACTCTTTTACTGCTTTCTCTTTTACAACCTCTTTTACTCTTTTAAGCTCTTTTCTCTTTTGCGCCTCTTTTCCTCCCCTCTTTTTAGCTCTTTTCTCTTTTTCGCTCTTTTCAGCTCTTTTCACTCTTTTGTTTTGAGCTCTTTTCAGACTCTTTTATCCTCTTTTTTCCTCTTTTAGCGCTCTTTTGTAGCCTCTTTT
motif is: CTCTTTTCT
-1
***Repl Closed***
我将***Repl Closed***
留在那里,努力不遗余力。也许它与Sublime REPL有关?
无论如何,你可能只能通过观察来判断,但这个主题实际上在DNA序列中发现了很多次,它只是找不到它的发现功能。是什么给了什么?
答案 0 :(得分:1)
break不适用于范围。请删除并尝试。我测试了以下代码。
with open('rosalind_subs.txt') as f:
seq = f.readline()
seq.strip()
subs = f.readline()
subs.strip()
def finder(x, y):
index = x.find(y)
return index
print("sequence is: " + seq)
print("subs is: " + subs)
print(finder(seq, subs))
输出
>>>
sequence is: ACCAGTCTCTTTTTTCTCTTTTCTCTTTTCTCTTTTGACCCTCTTTTCGTCACTCTTTTACCTCTTTTTCTCTTTTACTCTTTTCTCTTTTACTCTTTTACTCTTTTAGCGCAGATCTCTTTTCTCTTTTGGCTCTTTTGTCATCCTCTTTTAGACTCTTTTGGGAAGCGACGCCTCTTTTCTCTTTTCTCTTTTGCCTCTTTTTATAACCTAAAAGACTCTTTTCCCTCTTTTCCGATTTGCCAAGGGCTCTCTTTTCTCTTTTGCTCTTTTCTCTTTTCTCTTTTTACTCTTTTCTCTTTTCGCCCCAAGATTAACTCTTTTTCTCTTTTCTCTCTTTTTTCCTCTTTTCTCTTTTGAATTGACCTCTTTTTCTCTTTTTTTGGGCCGCTCTTTTCTCTTTTACTCTTTTCTCTCTTTTAACAGCTCTTTTCCTTCTCTTTTGTCTCTTTTAGTATACTCTTTTACTCTTTTCTCTTTTCTCTCTTTTACTCTTTTGCTCTTTTCTCTTTTTGTCTCTTTTGCCCTGTCTCTTTTCACGCTTCTCTTTTAGTGTACTCTTTTACTCTTTTTGGCTCTTTTCGAATTTGTTAGCTCTTTTGCTCTTTTCTCTTTTGCTCTTTTGTCTCTTTTGATCAGATTCTCTTTTTCTCTTTTCTCTTTTCCTTAAGCAGATTTCTCTTTTCTCTTTTTCTCTCTTTTGCTCTTTTACTCTTTTACTGCTTTCTCTTTTACAACCTCTTTTACTCTTTTAAGCTCTTTTCTCTTTTGCGCCTCTTTTCCTCCCCTCTTTTTAGCTCTTTTCTCTTTTTCGCTCTTTTCAGCTCTTTTCACTCTTTTGTTTTGAGCTCTTTTCAGACTCTTTTATCCTCTTTTTTCCTCTTTTAGCGCTCTTTTGTAGCCTCTTTT
subs is: CTCTTTTCT
15
答案 1 :(得分:0)
同时也是这里的生物学家who has done several rosalind.info exercises。
首先,您可以使用splitlines()
来改进您在序列和主题中读取的代码,它负责删除换行符。另请注意我如何使用tuple unpacking一次分配seq
和motif
变量。
with open('rosalind_subs.txt') as f:
seq, motif = f.read().splitlines()
接下来,您正确地注意到find
仅返回主题第一次出现的索引。要查找所有事件,有助于知道find需要另一个可选参数start
。如果你提供,它会从该索引位置开始查看。在循环中使用它可以获得所有索引。
另一种方法是使用regular expressions。请注意,图案可以相互重叠,因此您需要使用lookahead assertion。