我有这两个实现来计算有限生成器的长度,同时保留数据以供进一步处理:
def count_generator1(generator):
'''- build a list with the generator data
- get the length of the data
- return both the length and the original data (in a list)
WARNING: the memory use is unbounded, and infinite generators will block this'''
l = list(generator)
return len(l), l
def count_generator2(generator):
'''- get two generators from the original generator
- get the length of the data from one of them
- return both the length and the original data, as returned by tee
WARNING: tee can use up an unbounded amount of memory, and infinite generators will block this'''
for_length, saved = itertools.tee(generator, 2)
return sum(1 for _ in for_length), saved
两者都有缺点,都可以胜任。有人可以对它们发表评论,甚至可以提供更好的选择吗?
答案 0 :(得分:12)
如果必须这样做,第一种方法要好得多 - 当你使用所有值时,itertools.tee()
必须存储所有值,这意味着列表将更有效。
引用the docs:
这个itertool可能需要大量的辅助存储(取决于 需要存储多少临时数据)。一般来说,如果一个 迭代器在另一个迭代器启动之前使用大部分或全部数据, 使用list()而不是tee()会更快。
答案 1 :(得分:2)
我在一些我能想到的方法上运行了Windows 64位Python 3.4.3 timeit
:
>>> from timeit import timeit
>>> from textwrap import dedent as d
>>> timeit(
... d("""
... count = -1
... for _ in s:
... count += 1
... count += 1
... """),
... "s = range(1000)",
... )
50.70772041983173
>>> timeit(
... d("""
... count = -1
... for count, _ in enumerate(s):
... pass
... count += 1
... """),
... "s = range(1000)",
... )
42.636973504498656
>>> timeit(
... d("""
... count, _ = reduce(f, enumerate(range(1000)), (-1, -1))
... count += 1
... """),
... d("""
... from functools import reduce
... def f(_, count):
... return count
... s = range(1000)
... """),
... )
121.15513102540672
>>> timeit("count = sum(1 for _ in s)", "s = range(1000)")
58.179126025925825
>>> timeit("count = len(tuple(s))", "s = range(1000)")
19.777029680237774
>>> timeit("count = len(list(s))", "s = range(1000)")
18.145157531932
>>> timeit("count = len(list(1 for _ in s))", "s = range(1000)")
57.41422175998332
令人震惊的是,最快的方法是使用list
(甚至不是tuple
)来耗尽迭代器并从中获取长度:
>>> timeit("count = len(list(s))", "s = range(1000)")
18.145157531932
当然,这会影响记忆问题。最好的低内存替代方法是在NOOP for
上使用枚举 - 循环:
>>> timeit(
... d("""
... count = -1
... for count, _ in enumerate(s):
... pass
... count += 1
... """),
... "s = range(1000)",
... )
42.636973504498656
干杯!
答案 2 :(得分:0)
如果在处理数据之前不需要迭代器的长度,则可以在将来使用helper方法来添加对迭代器/流的处理的计数:
REP
用法是:
var result = //some Entity framework IOrderedQueryable;
var fullList = await result.ToListAsync();
var currentIndex = fullList.FindIndex(x => x.Id== model.Id);
var nextRecord = fullList[currentIndex + 1];