我正在创建一个被另一个函数消耗的生成器,但我仍然想知道生成了多少项:
lines = (line.rstrip('\n') for line in sys.stdin)
process(lines)
print("Processed {} lines.".format( ? ))
我能想到的最好的方法是用一个保持计数的类来包装生成器,或者可以将其内部转出并发送()内容。是否有一种优雅而有效的方法来查看生成器有多少项当你不是在Python 2中使用它时产生的?
编辑:以下是我最终的结果:
class Count(Iterable):
"""Wrap an iterable (typically a generator) and provide a ``count``
field counting the number of items.
Accessing the ``count`` field before iteration is finished will
invalidate the count.
"""
def __init__(self, iterable):
self._iterable = iterable
self._counter = itertools.count()
def __iter__(self):
return itertools.imap(operator.itemgetter(0), itertools.izip(self._iterable, self._counter))
@property
def count(self):
self._counter = itertools.repeat(self._counter.next())
return self._counter.next()
答案 0 :(得分:13)
如果您不在乎使用发电机,您可以这样做:
sum(1 for x in gen)
答案 1 :(得分:10)
以下是使用itertools.count()
示例的另一种方式:
import itertools
def generator():
for i in range(10):
yield i
def process(l):
for i in l:
if i == 5:
break
def counter_value(counter):
import re
return int(re.search('\d+', repr(counter)).group(0))
counter = itertools.count()
process(i for i, v in itertools.izip(generator(), counter))
print "Element consumed by process is : %d " % counter_value(counter)
# output: Element consumed by process is : 6
希望这有用。
答案 2 :(得分:8)
通常,我只是将生成器转换为列表并取其长度。如果你有理由认为这会消耗太多内存,你最好的选择似乎确实是你自己建议的包装类。但这并不算太糟糕:
class CountingIterator(object):
def __init__(self, it):
self.it = it
self.count = 0
def __iter__(self):
return self
def next(self):
nxt = next(self.it)
self.count += 1
return nxt
__next__ = next
(最后一行是为了向前兼容Python 3.x。)
答案 3 :(得分:2)
这是另一种方法。计数输出列表的使用有点难看,但它非常紧凑:
def counter(seq, count_output_list):
for x in seq:
count_output_list[0] += 1
yield x
像这样使用:
count = [0]
process(counter(lines, count))
print count[0]
也可以让counter()
使用一个dict,它可以添加一个“count”键,或者一个可以设置count
成员的对象。
答案 4 :(得分:1)
这是另一种类似于@ sven-marnach的解决方案:
class IterCounter(object):
def __init__(self, it):
self._iter = it
self.count = 0
def _counterWrapper(self, it):
for i in it:
yield i
self.count += 1
def __iter__(self):
return self._counterWrapper(self._iter)
我使用生成器函数包装了迭代器,并避免重新定义next
。结果是可迭代的(不是迭代器,因为它缺少next
方法)但是如果它是enugh的则更快。在我的测试中,这个速度提高了10%。
答案 5 :(得分:1)
如果您不需要返回计数而只想记录它,则可以使用finally块:
def generator():
i = 0
try:
for x in range(10):
i += 1
yield x
finally:
print '{} iterations'.format(i)
[ n for n in generator() ]
产生:
10 iterations
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
答案 6 :(得分:0)
此解决方案使用side_effect
包中的more_itertools
。
from typing import TypeVar, Tuple, Iterator, Callable, Iterable
from itertools import count
from more_itertools import side_effect, peekable
T = TypeVar("T")
def counter_wrap(iterable: Iterable[T]) -> \
Tuple[Iterator[T], Callable[[], int]]:
"""
Returns a new iterator based on ``iterable``
and a getter that when called returns the number of times
the returned iterator was called up until that time
"""
counter = peekable(count())
def get_count() -> int:
return counter.peek()
return (
side_effect(lambda e: next(counter), iterable),
get_count
)
它可以用作:
>>> iterator, counter = counter_wrap((1, 2, 3, 4, 5, 6, "plast", "last"))
>>> counter()
0
>>> counter() # Calling this has no side effect (counter not incremented)
0
>>> next(iterator)
1
>>> next(iterator)
2
>>> next(iterator)
3
>>> counter() # Updates when the iterator returns an element
3
>>> next(iterator)
4
>>> next(iterator)
5
>>> next(iterator)
6
>>> next(iterator)
'plast'
>>> counter()
7
>>> next(iterator)
'last'
>>> counter()
8
>>> next(iterator)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> counter()
8