我希望“分叉”大量数据流,以便只查看几个元素。
我希望写下这样的东西:
from itertools import tee
stream = # a generator of a very large data stream
while True:
try:
element= stream.next()
process_element( element )
if some_condition( element ):
stream, fork= tee(stream)
process_fork( fork )
except StopIteration:
break
阅读the documentation for tee
但是,即使在deque
超出范围之后,我仍然认为fork
的{{1}}会继续增长。< / p>
是这样的吗?如果是这样,有没有办法告诉fork
“丢弃”分叉?或者还有另一种更明显的方法吗?
答案 0 :(得分:1)
您可以通过创建Tee
类并为其提供discard()
方法来避免依赖于实现的行为@goncalopp:
class Tee(object):
def __init__(self, iterable, n=2):
it = iter(iterable)
self.deques = [collections.deque() for _ in range(n)]
def gen(mydeque):
while True:
if not mydeque: # when the local deque is empty
newval = next(it) # fetch a new value and
for d in self.deques: # load it to all the active deques
d.append(newval)
yield mydeque.popleft()
self.generators = [gen(d) for d in self.deques]
def __call__(self):
return self.generators
def discard(gen):
index = self.generators.index(gen)
del self.deques[index]
del self.generators[index]
请注意,因为它现在是一个类,使用它会略有不同。但是,当您使用fork
完成后,您可以通过调用tee.discard(fork)
来摆脱它。这是一个例子:
tee = None
while True:
try:
element = stream.next()
process_element(element)
if some_condition(element):
if not tee:
tee = Tee(stream)
stream, fork = tee()
process_fork(fork)
except StopIteration:
break
if tee:
tee.discard(fork)
fork = None
答案 1 :(得分:0)
这是一个简单的测试脚本:
from itertools import tee
def natural_numbers():
i=0
while True:
yield i
i+=1
stream = natural_numbers() #Don't use xrange, cpython optimizes it away
stream, fork= tee(stream)
del fork
for e in stream:
pass
似乎至少在CPython中,过程&#39;记忆力不会持续增长。 There seems to be a mechanism that detects this situation。
但是,如果用tee
替换python代码the documentation状态是等效的......
def tee(iterable, n=2):
it = iter(iterable)
deques = [collections.deque() for i in range(n)]
def gen(mydeque):
while True:
if not mydeque: # when the local deque is empty
newval = next(it) # fetch a new value and
for d in deques: # load it to all the deques
d.append(newval)
yield mydeque.popleft()
return tuple(gen(d) for d in deques)
......记忆确实在不断增长,正如预期的那样。
所以,我的猜测是,这将是依赖于实现的行为