如何计算其他代码使用的生成器中的项目

时间:2011-06-10 16:22:43

标签: python count generator

我正在创建一个被另一个函数消耗的生成器,但我仍然想知道生成了多少项:

lines = (line.rstrip('\n') for line in sys.stdin)
process(lines)
print("Processed {} lines.".format( ? ))

我能想到的最好的方法是用一个保持计数的类来包装生成器,或者可以将其内部转出并发送()内容。是否有一种优雅而有效的方法来查看生成器有多少项当你不是在Python 2中使用它时产生的?

编辑:以下是我最终的结果:

class Count(Iterable):
    """Wrap an iterable (typically a generator) and provide a ``count``
    field counting the number of items.

    Accessing the ``count`` field before iteration is finished will
    invalidate the count.
    """
    def __init__(self, iterable):
        self._iterable = iterable
        self._counter = itertools.count()

    def __iter__(self):
        return itertools.imap(operator.itemgetter(0), itertools.izip(self._iterable, self._counter))

    @property
    def count(self):
        self._counter = itertools.repeat(self._counter.next())
        return self._counter.next()

7 个答案:

答案 0 :(得分:13)

如果您不在乎使用发电机,您可以这样做:

sum(1 for x in gen)

答案 1 :(得分:10)

以下是使用itertools.count()示例的另一种方式:

import itertools

def generator():
    for i in range(10):
       yield i

def process(l):
    for i in l:
        if i == 5:
            break

def counter_value(counter):
    import re
    return int(re.search('\d+', repr(counter)).group(0))

counter = itertools.count()
process(i for i, v in itertools.izip(generator(), counter))

print "Element consumed by process is : %d " % counter_value(counter)
# output: Element consumed by process is : 6

希望这有用。

答案 2 :(得分:8)

通常,我只是将生成器转换为列表并取其长度。如果你有理由认为这会消耗太多内存,你最好的选择似乎确实是你自己建议的包装类。但这并不算太糟糕:

class CountingIterator(object):
    def __init__(self, it):
        self.it = it
        self.count = 0
    def __iter__(self):
        return self
    def next(self):
        nxt = next(self.it)
        self.count += 1
        return nxt
    __next__ = next

(最后一行是为了向前兼容Python 3.x。)

答案 3 :(得分:2)

这是另一种方法。计数输出列表的使用有点难看,但它非常紧凑:

def counter(seq, count_output_list):
    for x in seq:
        count_output_list[0] += 1
        yield x

像这样使用:

count = [0]
process(counter(lines, count))
print count[0]

也可以让counter()使用一个dict,它可以添加一个“count”键,或者一个可以设置count成员的对象。

答案 4 :(得分:1)

这是另一种类似于@ sven-marnach的解决方案:

class IterCounter(object):
  def __init__(self, it):
    self._iter = it
    self.count = 0

  def _counterWrapper(self, it):
    for i in it:
      yield i
      self.count += 1

  def __iter__(self):
    return self._counterWrapper(self._iter)

我使用生成器函数包装了迭代器,并避免重新定义next。结果是可迭代的(不是迭代器,因为它缺少next方法)但是如果它是enugh的则更快。在我的测试中,这个速度提高了10%。

答案 5 :(得分:1)

如果您不需要返回计数而只想记录它,则可以使用finally块:

def generator():
    i = 0
    try:
        for x in range(10):
            i += 1
            yield x
    finally:
        print '{} iterations'.format(i)

[ n for n in generator() ]

产生:

10 iterations
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

答案 6 :(得分:0)

此解决方案使用side_effect包中的more_itertools

from typing import TypeVar, Tuple, Iterator, Callable, Iterable
from itertools import count
from more_itertools import side_effect, peekable

T = TypeVar("T")
def counter_wrap(iterable: Iterable[T]) -> \
        Tuple[Iterator[T], Callable[[], int]]:
    """
    Returns a new iterator based on ``iterable``
    and a getter that when called returns the number of times
    the returned iterator was called up until that time
    """
    counter = peekable(count())
    def get_count() -> int:
        return counter.peek()
    return (
        side_effect(lambda e: next(counter), iterable),
        get_count
    )

它可以用作:

>>> iterator, counter = counter_wrap((1, 2, 3, 4, 5, 6, "plast", "last"))
>>> counter()
0
>>> counter()  # Calling this has no side effect (counter not incremented)
0
>>> next(iterator)
1
>>> next(iterator)
2
>>> next(iterator)
3
>>> counter()  # Updates when the iterator returns an element
3
>>> next(iterator)
4
>>> next(iterator)
5
>>> next(iterator)
6
>>> next(iterator)
'plast'
>>> counter()
7
>>> next(iterator)
'last'
>>> counter()
8
>>> next(iterator)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> counter()
8