我需要合并两个迭代器。我写了这个函数:
def merge_no_repeat(iter1, iter2, key=None):
"""
a = iter([(2, 'a'), (4, 'a'), (6, 'a')])
b = iter([(1, 'b'), (2, 'b'), (3, 'b'), (4, 'b'), (5, 'b'), (6, 'b'), (7, 'b'), (8, 'b')])
key = lambda item: item[0]
fusion_no_repeat(a, b, key) ->
iter([(1, 'b'), (2, 'a'), (3, 'b'), (4, 'a'), (5, 'b'), (6, 'a'), (7, 'b'), (8, 'b')])
:param iter1: sorted iterator
:param iter2: sorted iterator
:param key: lambda get sorted key, default: lambda x: x
:return: merged iterator
"""
if key is None:
key = lambda x: x
element1 = next(iter1, None)
element2 = next(iter2, None)
while element1 is not None or element2 is not None:
if element1 is None:
yield element2
element2 = next(iter2, None)
elif element2 is None:
yield element1
element1 = next(iter1, None)
elif key(element1) > key(element2):
yield element2
element2 = next(iter2, None)
elif key(element1) == key(element2):
yield element1
element1 = next(iter1, None)
element2 = next(iter2, None)
elif key(element1) < key(element2):
yield element1
element1 = next(iter1, None)
此功能有效。但我认为这太复杂了。是否可以使用 Python标准库使这个功能最简单?
答案 0 :(得分:1)
pytoolz library有an implementation。它看起来不像是使用任何非标准库函数,所以如果你真的不想包含一个外部库,你可能只需要复制代码。
如果您对速度感兴趣,那么还有cython implementation的pytoolz。
答案 1 :(得分:1)
一,如果其中一个迭代器返回None,则会失败,您应该捕获StopIteration异常。二,一旦其中一个迭代器没有更多值,你就可以返回另一个迭代器的所有其余值。
我认为如果在迭代器周围使用一个小的包装类使下一个值可见,我会更容易做到:
class NextValueWrapper(object):
def __init__(self, iterator):
self.iterator = iterator
self.next_value = None
self.finished = False
self.get()
def get(self):
if self.finished: return # Shouldn't happen, maybe raise an exception
value = self.next_value
try:
self.next_value = next(self.iterator)
except StopIteration:
self.finished = True
return value
然后代码变为:
def merge(iter1, iter2, key=None):
if key is None:
key = lambda x: x
wrap1 = NextValueWrapper(iter1)
wrap2 = NextValueWrapper(iter2)
while not (wrap1.finished and wrap2.finished):
if (wrap2.finished or
(not wrap1.finished and
key(wrap1.next_value) <= key(wrap2.next_value))):
yield wrap1.get()
else:
yield wrap2.get()
这是未经测试的。它重复了。出于习惯,它是Python 2。让它不重复是留给读者的练习,我没有注意到这也是一个要求...
答案 2 :(得分:0)
您可以使用:
def merge_no_repeat(iter1, iter2, key=None):
if key is None:
key = lambda x: x
ref = next(iter1, None)
for elem in iter2:
key_elem = key(elem) # caching value so we won't compute it for each value in iter1 that is before this one
while ref is not None and key_elem > key(ref):
# Catch up with low values from iter1
yield ref
ref = next(iter1, None)
if ref is None or key_elem < key(ref):
# Catch up with low values from iter2, eliminate duplicates
yield elem
# Update: I forgot to consume iter1 in the first version of this code
for elem in iter1:
# Use remaining items of iter1 if needed
yield elem
我认为迭代器不会返回None
值,除非完全耗尽,因为您在原始代码中进行了if element1 is None:
和elif element1 is None:
测试。
示例:
>>> from operator import itemgetter
>>> list(merge_no_repeat(
... iter([(2, 'a'), (4, 'a'), (6, 'a')]),
... iter([(1, 'b')]),
... itemgetter(0)))
[(1, 'b'), (2, 'a'), (4, 'a'), (6, 'a')]
>>> list(merge_no_repeat(
... iter([(2, 'a'), (4, 'a'), (6, 'a')]),
... iter([(1, 'b'),(7, 'b'), (8, 'b')]),
... itemgetter(0)))
[(1, 'b'), (2, 'a'), (4, 'a'), (6, 'a'), (7, 'b'), (8, 'b')]
>>> list(merge_no_repeat(
... iter([(2, 'a'), (4, 'a'), (6, 'a')]),
... iter([(1, 'b'),(3, 'b'), (4,'b'),(5,'b'),(7, 'b'), (8, 'b')]),
... itemgetter(0)))
[(1, 'b'), (2, 'a'), (3, 'b'), (4, 'a'), (5, 'b'), (6, 'a'), (7, 'b'), (8, 'b')]