我目前正在研究MIT OCW的一个问题集,任务是在DNA序列中找到匹配的子串。
我正在努力编写一个返回长度为k的子序列的函数。我可以在使用字符串时使用它,但问题是使用迭代器设置的,当使用迭代器时,函数似乎每次都重置,而不是回到原来的yield位置。
这是我编写的使用字符串的正确函数:
def subs(seq, k):
subseq = ''
pos = 0
while pos < len(seq):
while len(subseq) < k:
subseq += seq[pos]
pos += 1
yield subseq, pos - k
subseq = subseq[1:]
正确答案:
>>> a = 'hello'
>>> b = subs(a,2)
>>> b.next()
('he', 0)
>>> b.next()
('el', 1)
>>> b.next()
('ll', 2)
>>> b.next()
('lo', 3)
>>> b.next()
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
b.next()
StopIteration
我的问题
使用从长字符串序列创建迭代器的类来设置任务,我不会在这里进行,但给出的测试也会从字符串中创建迭代器
# This test case may break once you add the argument m (skipping).
class TestExactSubmatches(dnaseq.unittest.TestCase):
def test_one(self):
foo = 'yabcabcabcz'
bar = 'xxabcxxxx'
matches = list(dnaseq.getExactSubmatches(iter(foo), iter(bar), 3, 1))
correct = [(1,2), (4,2), (7,2)]
self.assertTrue(len(matches) == len(correct))
for x in correct:
self.assertTrue(x in matches)
和我目前的解决方案:
def subsequenceHashes(seq, k):
subseq = ''
pos = 0
print 'Start of subseqHashes'
try:
while True:
while len(subseq) < k:
subseq += seq.next()
pos += 1
print subseq, pos - k
yield hash(subseq), pos - k
subseq = subseq[1:]
except StopIteration:
return
调用它的函数获取子序列的哈希值,将它们与子序列开始的位置放在一个字典(类多字节)中,并将子字符串与相同的哈希值进行比较,看它们是否相同。然后它应该返回两个相同子串的位置对。我没有设法调试这个函数的大部分,因为我遇到了它的问题。
def getExactSubmatches(a, b, k, m):
# a and b are the strings compared, k is the length of substring, parameter m is unused, need it for later on in the problem set
ahash, apos = subsequenceHashes(a, k).next()
bhash, bpos = subsequenceHashes(b, k).next()
multidict = Multidict()
print 'starting'
while ahash:
print 'iterate'
multidict.put(ahash, ('a', apos))
ahash, apos = subsequenceHashes(a, k).next()
print apos
while bhash:
multidict.put(bhash, ('b', bpos))
bhash, bpos = subsequenceHashes(b, k).next()
for key in multidict.mydict:
if len(multidict.get(key)) > 1:
for t in multidict.get(key):
if t[0] == 'a':
for s in multidict.get(key):
if s[0] == 'b':
if a[apos:apos+k] == b[bpos:bpos+k]:
print apos, bpos
yield apos, bpos
运行测试时会发生什么:
Start of subseqHashes
yab 0
Start of subseqHashes
xxa 0
starting
iterate
Start of subseqHashes
cab 0
0
iterate
Start of subseqHashes
cab 0
0
iterate
Start of subseqHashes
F..
======================================================================
FAIL: test_one (__main__.TestExactSubmatches)
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\Users\Alex\Desktop\Pythonwork\6.006\ps4\dist\test_dnaseq.py", line 32, in test_one
self.assertTrue(len(matches) == len(correct))
AssertionError: False is not true
似乎出现了什么问题,每次使用.next()时,子序列都会被重置,当它在主体中有一个迭代器而不是在使用字符串时停留在循环中。
答案 0 :(得分:0)
正如@jonrsharpe所说,我的错误是多次调用生成器函数而不是实际迭代它。