Python中的生成器函数

时间:2015-01-05 11:50:49

标签: python function iterator generator string-matching

我目前正在研究MIT OCW的一个问题集,任务是在DNA序列中找到匹配的子串。

我正在努力编写一个返回长度为k的子序列的函数。我可以在使用字符串时使用它,但问题是使用迭代器设置的,当使用迭代器时,函数似乎每次都重置,而不是回到原来的yield位置。

这是我编写的使用字符串的正确函数:

def subs(seq, k):
    subseq = ''
    pos = 0
    while pos < len(seq):
        while len(subseq) < k:
            subseq += seq[pos]
            pos += 1
        yield subseq, pos - k
        subseq = subseq[1:] 

正确答案:

>>> a = 'hello'
>>> b = subs(a,2)
>>> b.next()
('he', 0)
>>> b.next()
('el', 1)
>>> b.next()
('ll', 2)
>>> b.next()
('lo', 3)
>>> b.next()

Traceback (most recent call last):
  File "<pyshell#13>", line 1, in <module>
    b.next()
StopIteration

我的问题

使用从长字符串序列创建迭代器的类来设置任务,我不会在这里进行,但给出的测试也会从字符串中创建迭代器

# This test case may break once you add the argument m (skipping).
class TestExactSubmatches(dnaseq.unittest.TestCase):
   def test_one(self):
       foo = 'yabcabcabcz'
       bar = 'xxabcxxxx'
       matches = list(dnaseq.getExactSubmatches(iter(foo), iter(bar), 3, 1))
       correct = [(1,2), (4,2), (7,2)]
       self.assertTrue(len(matches) == len(correct))
       for x in correct:
           self.assertTrue(x in matches)

和我目前的解决方案:

def subsequenceHashes(seq, k):
    subseq = ''
    pos = 0
    print 'Start of subseqHashes'
    try:
        while True:
            while len(subseq) < k:
                subseq += seq.next()
                pos += 1
            print subseq, pos - k
            yield hash(subseq), pos - k
            subseq = subseq[1:]
    except StopIteration:
        return

调用它的函数获取子序列的哈希值,将它们与子序列开始的位置放在一个字典(类多字节)中,并将子字符串与相同的哈希值进行比较,看它们是否相同。然后它应该返回两个相同子串的位置对。我没有设法调试这个函数的大部分,因为我遇到了它的问题。

def getExactSubmatches(a, b, k, m): 
    # a and b are the strings compared, k is the length of substring, parameter m is unused, need it for later on in the problem set
    ahash, apos = subsequenceHashes(a, k).next()
    bhash, bpos = subsequenceHashes(b, k).next()
    multidict = Multidict()
    print 'starting'
    while ahash:
        print 'iterate'
        multidict.put(ahash, ('a', apos))
        ahash, apos = subsequenceHashes(a, k).next()
        print apos
    while bhash:
        multidict.put(bhash, ('b', bpos))
        bhash, bpos = subsequenceHashes(b, k).next()
    for key in multidict.mydict:
        if len(multidict.get(key)) > 1:
            for t in multidict.get(key):
                if t[0] == 'a':
                    for s in multidict.get(key):
                        if s[0] == 'b':
                            if a[apos:apos+k] == b[bpos:bpos+k]:
                                print apos, bpos
                                yield apos, bpos

运行测试时会发生什么:

Start of subseqHashes


yab 0
Start of subseqHashes


xxa 0
starting
iterate
Start of subseqHashes


cab 0
0
iterate
Start of subseqHashes


cab 0
0
iterate
Start of subseqHashes


F..
======================================================================
FAIL: test_one (__main__.TestExactSubmatches)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\Alex\Desktop\Pythonwork\6.006\ps4\dist\test_dnaseq.py", line 32, in test_one
    self.assertTrue(len(matches) == len(correct))
AssertionError: False is not true

似乎出现了什么问题,每次使用.next()时,子序列都会被重置,当它在主体中有一个迭代器而不是在使用字符串时停留在循环中。

1 个答案:

答案 0 :(得分:0)

正如@jonrsharpe所说,我的错误是多次调用生成器函数而不是实际迭代它。