如何在Python生成器中展望一个元素(peek)?

时间:2010-03-11 13:34:58

标签: python generator peek

我无法弄清楚如何在Python生成器中展望一个元素。我一看就走了。

这就是我的意思:

gen = iter([1,2,3])
next_value = gen.next()  # okay, I looked forward and see that next_value = 1
# but now:
list(gen)  # is [2, 3]  -- the first value is gone!

这是一个更实际的例子:

gen = element_generator()
if gen.next_value() == 'STOP':
  quit_application()
else:
  process(gen.next())

任何人都可以帮我写一个你可以向前看一个元素的生成器吗?

16 个答案:

答案 0 :(得分:62)

为了完整起见,more-itertools package(应该是任何Python程序员工具箱的一部分)包含一个实现此行为的peekable包装器。正如the documentation中的代码示例所示:

>>> p = peekable(xrange(2))
>>> p.peek()
0
>>> p.next()
0
>>> p.peek()
1
>>> p.next()
1

该软件包兼容Python 2和3,即使文档显示了Python 2语法。

答案 1 :(得分:50)

Python生成器API是一种方式:你不能推回你读过的元素。但是您可以使用itertools module创建一个新的迭代器并添加元素:

import itertools

gen = iter([1,2,3])
peek = gen.next()
print list(itertools.chain([peek], gen))

答案 2 :(得分:24)

好的 - 两年太晚了 - 但我遇到了这个问题,并没有找到任何满意的答案。想出了这个元生成器:

class Peekorator(object):

    def __init__(self, generator):
        self.empty = False
        self.peek = None
        self.generator = generator
        try:
            self.peek = self.generator.next()
        except StopIteration:
            self.empty = True

    def __iter__(self):
        return self

    def next(self):
        """
        Return the self.peek element, or raise StopIteration
        if empty
        """
        if self.empty:
            raise StopIteration()
        to_return = self.peek
        try:
            self.peek = self.generator.next()
        except StopIteration:
            self.peek = None
            self.empty = True
        return to_return

def simple_iterator():
    for x in range(10):
        yield x*3

pkr = Peekorator(simple_iterator())
for i in pkr:
    print i, pkr.peek, pkr.empty

结果:

0 3 False
3 6 False
6 9 False
9 12 False    
...
24 27 False
27 None False

即。在迭代期间,您可以随时访问列表中的下一个项目。

答案 3 :(得分:15)

您可以使用itertools.tee生成生成器的轻量级副本。然后在一份副本前面偷看不会影响第二份副本:

import itertools

def process(seq):
    peeker, items = itertools.tee(seq)

    # initial peek ahead
    # so that peeker is one ahead of items
    if next(peeker) == 'STOP':
        return

    for item in items:

        # peek ahead
        if next(peeker) == "STOP":
            return

        # process items
        print(item)

'items'生成器不受你骚扰'peeker'的影响。请注意,在调用'tee'之后你不应该使用原来的'seq',否则会破坏它。

FWIW,这是错误解决此问题的方法。任何要求您在生成器中查找前面的项目的算法也可以编写为使用当前生成器项目和前一项目。那么你不必破坏你对发生器的使用,你的代码会简单得多。请参阅我对这个问题的其他答案。

答案 4 :(得分:4)

>>> gen = iter(range(10))
>>> peek = next(gen)
>>> peek
0
>>> gen = (value for g in ([peek], gen) for value in g)
>>> list(gen)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

答案 5 :(得分:4)

为了好玩,我根据建议创建了一个前瞻类的实现 亚伦:

import itertools

class lookahead_chain(object):
    def __init__(self, it):
        self._it = iter(it)

    def __iter__(self):
        return self

    def next(self):
        return next(self._it)

    def peek(self, default=None, _chain=itertools.chain):
        it = self._it
        try:
            v = self._it.next()
            self._it = _chain((v,), it)
            return v
        except StopIteration:
            return default

lookahead = lookahead_chain

有了这个,以下将起作用:

>>> t = lookahead(xrange(8))
>>> list(itertools.islice(t, 3))
[0, 1, 2]
>>> t.peek()
3
>>> list(itertools.islice(t, 3))
[3, 4, 5]

通过这种实现,连续多次调用peek是一个坏主意......

在查看CPython源代码时,我发现了一种更短,更高效的更好方法:

class lookahead_tee(object):
    def __init__(self, it):
        self._it, = itertools.tee(it, 1)

    def __iter__(self):
        return self._it

    def peek(self, default=None):
        try:
            return self._it.__copy__().next()
        except StopIteration:
            return default

lookahead = lookahead_tee

使用方法与上述相同,但您不会在这里付出代价来连续多次使用Peek。通过更多行,您还可以在迭代器中查看多个项目(最多可用RAM)。

答案 6 :(得分:3)

而不是使用项目(i,i + 1),其中'i'是当前项目而i + 1是'peek ahead'版本,你应该使用(i-1,i),其中'i -1'是生成器的先前版本。

以这种方式调整算法会产生与你现在相同的东西,除了试图“向前看”的额外不必要的复杂性。

向前偷看是一个错误,你不应该这样做。

答案 7 :(得分:2)

这将起作用 - 它缓冲一个项目,并为每个项目和序列中的下一个项目调用一个函数。

对于序列结束时发生的事情你的要求是模糊的。当你在最后一个时,“向前看”是什么意思?

def process_with_lookahead( iterable, aFunction ):
    prev= iterable.next()
    for item in iterable:
        aFunction( prev, item )
        prev= item
    aFunction( item, None )

def someLookaheadFunction( item, next_item ):
    print item, next_item

答案 8 :(得分:2)

一个简单的解决方案是使用这样的函数:

def peek(it):
    first = next(it)
    return first, itertools.chain([first], it)

然后你可以这样做:

>>> it = iter(range(10))
>>> x, it = peek(it)
>>> x
0
>>> next(it)
0
>>> next(it)
1

答案 9 :(得分:1)

如果有人感兴趣,如果我错了请纠正我,但我相信向任何迭代器添加一些回推功能都很容易。

class Back_pushable_iterator:
    """Class whose constructor takes an iterator as its only parameter, and
    returns an iterator that behaves in the same way, with added push back
    functionality.

    The idea is to be able to push back elements that need to be retrieved once
    more with the iterator semantics. This is particularly useful to implement
    LL(k) parsers that need k tokens of lookahead. Lookahead or push back is
    really a matter of perspective. The pushing back strategy allows a clean
    parser implementation based on recursive parser functions.

    The invoker of this class takes care of storing the elements that should be
    pushed back. A consequence of this is that any elements can be "pushed
    back", even elements that have never been retrieved from the iterator.
    The elements that are pushed back are then retrieved through the iterator
    interface in a LIFO-manner (as should logically be expected).

    This class works for any iterator but is especially meaningful for a
    generator iterator, which offers no obvious push back ability.

    In the LL(k) case mentioned above, the tokenizer can be implemented by a
    standard generator function (clean and simple), that is completed by this
    class for the needs of the actual parser.
    """
    def __init__(self, iterator):
        self.iterator = iterator
        self.pushed_back = []

    def __iter__(self):
        return self

    def __next__(self):
        if self.pushed_back:
            return self.pushed_back.pop()
        else:
            return next(self.iterator)

    def push_back(self, element):
        self.pushed_back.append(element)
it = Back_pushable_iterator(x for x in range(10))

x = next(it) # 0
print(x)
it.push_back(x)
x = next(it) # 0
print(x)
x = next(it) # 1
print(x)
x = next(it) # 2
y = next(it) # 3
print(x)
print(y)
it.push_back(y)
it.push_back(x)
x = next(it) # 2
y = next(it) # 3
print(x)
print(y)

for x in it:
    print(x) # 4-9

答案 10 :(得分:1)

cytoolz具有peek功能。

>> from cytoolz import peek
>> gen = iter([1,2,3])
>> first, continuation = peek(gen)
>> first
1
>> list(continuation)
[1, 2, 3]

答案 11 :(得分:1)

一个迭代器,它允许窥视下一个元素并且也可以向前看。它会根据需要预先读取并记住deque中的值。

from collections import deque

class PeekIterator:

    def __init__(self, iterable):
        self.iterator = iter(iterable)
        self.peeked = deque()

    def __iter__(self):
        return self

    def __next__(self):
        if self.peeked:
            return self.peeked.popleft()
        return next(self.iterator)

    def peek(self, ahead=0):
        while len(self.peeked) <= ahead:
            self.peeked.append(next(self.iterator))
        return self.peeked[ahead]

演示:

>>> it = PeekIterator(range(10))
>>> it.peek()
0
>>> it.peek(5)
5
>>> it.peek(13)
Traceback (most recent call last):
  File "<pyshell#68>", line 1, in <module>
    it.peek(13)
  File "[...]", line 15, in peek
    self.peeked.append(next(self.iterator))
StopIteration
>>> it.peek(2)
2
>>> next(it)
0
>>> it.peek(2)
3
>>> list(it)
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>>

答案 12 :(得分:0)

虽然itertools.chain()是这里工作的自然工具,但要注意这样的循环:

for elem in gen:
    ...
    peek = next(gen)
    gen = itertools.chain([peek], gen)

...因为这将消耗线性增长的内存量,并最终停止。 (这段代码本质上似乎创建了一个链表,每个chain()调用一个节点。)我知道这不是因为我检查了libs但是因为这只是导致我的程序大幅减速 - 摆脱{{1}线再次加速。 (Python 3.3)

答案 13 :(得分:0)

@jonathan-hartley的Python3代码段回答:

graph = TitanFactory.open("conf/titan-cassandra.properties")

可以直接创建一个在Could not instantiate implementation: com.thinkaurelius.titan.diskstorage.es.ElasticSearchIndex 上执行此操作的类,只生成def peek(iterator, eoi=None): iterator = iter(iterator) try: prev = next(iterator) except StopIteration: return iterator for elm in iterator: yield prev, elm prev = elm yield prev, eoi for curr, nxt in peek(range(10)): print((curr, nxt)) # (0, 1) # (1, 2) # (2, 3) # (3, 4) # (4, 5) # (5, 6) # (6, 7) # (7, 8) # (8, 9) # (9, None) 项并将__iter__放在某个属性中。

答案 14 :(得分:0)

w.r.t @David Z的帖子,较新的seekable工具可以将包装的迭代器重置为先前的位置。

root@127.0.0.1

答案 15 :(得分:0)

对于那些接受节俭和单行的人,我向你们展示了一个单行,它允许你在迭代中向前看(这只适用于 Python 3.8 及更高版本):

>>> import itertools as it
>>> peek = lambda iterable, n=1: it.islice(zip(it.chain((t := it.tee(iterable))[0], [None] * n), it.chain([None] * n, t[1])), n, None)
>>> for lookahead, element in peek(range(10)):
...     print(lookahead, element)
1 0
2 1
3 2
4 3
5 4
6 5
7 6
8 7
9 8
None 9
>>> for lookahead, element in peek(range(10), 2):
...     print(lookahead, element)
2 0
3 1
4 2
5 3
6 4
7 5
8 6
9 7
None 8
None 9

这种方法通过避免多次复制迭代器来节省空间。由于它如何懒惰地生成元素,它也很快。最后,作为顶部的樱桃,您可以展望任意数量的元素。