我可以将迭代器标记为过早完成吗?

时间:2014-12-14 19:57:35

标签: python

是否有一种惯用的方法可以提前完成迭代器,以便任何进一步的next()提升StopIteration? (我可以想到丑陋的方式,比如滥用itertools.takewhile或丢弃值,直到事情耗尽)。

修改

我的算法将n个未知变量长度的迭代器作为输入。它使用izip_longest()从n-tuples中的每个元素中读取一个项目,直到所有项目都用完为止。有时我发现我想根据某些运行时标准从其中一个迭代器中提前停止输入,并将其替换为izip_longest()提供的默认值流。我能想到的侵入性最小的方法是以某种方式“完成”它。

5 个答案:

答案 0 :(得分:3)

来自itertools Recipes

def consume(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

答案 1 :(得分:1)

class MyIter:
    def __init__(self,what):
       self.what = what
       self.done = False
       self.iter = iter(what)
    def __iter__(self):
       self.done = False
       self.iter = iter(self.what)
    def next(self):
       if self.done: raise StopIteration
       return next(self.iter)

x = MyIter(range(100))
print next(x)
x.done=True
next(x)

但这听起来像个坏主意

你应该做的是

for my_iterator in all_iterators:
    for element in my_iterator: #iterate over it 
       if check(element): #if whatever condition is true
           break #then we are done with this iterator on to the next

对于@jme的评论中列出的示例,请使用类似这样的内容

for i,my_iterator in enumerate(all_iterators):
    for j,element in enumerate(my_iterator): #iterate over it 
       if j > i: #if whatever condition is true
           break #then we are done with this iterator on to the next
       else:
           do_something(element)

答案 2 :(得分:1)

在您的编辑中,您提供了用例:您想要的内容类似于izip_longest,但允许您“禁用”#34;过早地使用迭代器。这是一个允许它的迭代器类,以及"启用"以前禁用的迭代器。

class TerminableZipper(object):

    def __init__(self, iterators, fill="n/a"):
        self.iterators = collections.OrderedDict((it, True) 
                                                 for it in iterators)
        self.fill = fill
        self.zipper = itertools.izip_longest(*iterators, fillvalue=fill)

    def disable(self, iterator):
        self.iterators[iterator] = False
        self._make_iterators()

    def enable(self, iterator):
        self.iterators[iterator] = True
        self._make_iterators()

    def _make_iterators(self):
        def effective(it):
            iterator, active = it
            return iterator if active else iter([])

        effective_iterators = map(effective, self.iterators.items())                
        self.zipper = itertools.izip_longest(*effective_iterators, 
                                             fillvalue=self.fill)

    def __iter__(self):
        return self

    def next(self):
        return next(self.zipper)

一个例子:

>>> it_a = itertools.repeat(0)
>>> it_b = iter(["a", "b", "c", "d", "e", "f"])
>>> it_c = iter(["q", "r", "x"])
>>> zipper = TerminableZipper([it_a, it_b, it_c])
>>> next(zipper)
(0, 'a', 'q')
>>> next(zipper)
(0, 'b', 'r')
>>> zipper.disable(it_a)
>>> next(zipper)
('n/a', 'c', 'x')
>>> zipper.enable(it_a)
>>> next(zipper)
(0, 'd', 'n/a')

答案 3 :(得分:1)

这是另一个答案,我已经决定单独发布,因为它与我的另一个不同。我认为这可能是更可取的:将迭代器保持在有序的dict中,将每个迭代器映射到{True, False}(如果迭代器处于活动状态,则为True,否则为False)。首先,我们需要一个带有这样一个dict的函数,并在每个迭代器上调用next,返回默认值并更新迭代器的状态(如果它已用完):

import itertools
import collections

def deactivating_zipper(iterators, default):
    while True:
        values = []
        for iterator, active in iterators.items():
            if active:
                try:
                    values.append(next(iterator))
                except StopIteration:
                    values.append(default)
                    iterators[iterator] = False
            else:
                values.append(default)

        if not any(iterators.values()):
            return
        else:  
            yield values

现在如果我们有三个迭代器:

it_a = iter(["a", "b", "c", "d", "e"])
it_b = iter([1,2,3,4,5,6,7,8])
it_c = iter(["foo", "bar", "baz", "quux"])

iterators = collections.OrderedDict((it, True) for it in (it_a, it_b, it_c))

我们可以这样循环它们:

for a,b,c in deactivating_zipper(iterators, "n/a"):

    # deactivate it_a
    if b == 3:
        iterators[it_a] = False

    print a,b,c

这给出了输出:

a 1 foo
b 2 bar
c 3 baz
n/a 4 quux
n/a 5 n/a
n/a 6 n/a
n/a 7 n/a
n/a 8 n/a

答案 4 :(得分:0)

最后,我选择了滥用itertools.takewhile()。它比使用标志消费的其他答案更简洁一些。恒定时间内的迭代器:

from itertools import takewhile, izip_longest

def f(seqs):
    done = [False] * len(seqs)
    iters = [ takewhile(lambda _, i=i: not done[i], s) for i, s in enumerate(seqs) ]
    zipped = izip_longest(*iters)

    # for example:
    print next(zipped)
    done[1] = True
    print next(zipped)
    print next(zipped)

f((['a', 'b', 'c'], [1, 2, 3], ['foo', 'bar', 'baz']))

输出:

('a', 1, 'foo')
('b', None, 'bar')
('c', None, 'baz')