zip迭代器在python中

时间:2015-10-05 17:32:57

标签: python itertools

如果迭代的长度不相等,我正在寻找一种很好的方法来zip几次迭代引发异常。

如果迭代物是列表或具有len方法,则此解决方案简洁明了:

def zip_equal(it1, it2):
    if len(it1) != len(it2):
        raise ValueError("Lengths of iterables are different")
    return zip(it1, it2)

但是,如果it1it2是生成器,则前一个函数将失败,因为未定义长度TypeError: object of type 'generator' has no len()

我想itertools模块提供了一种简单的方法来实现它,但到目前为止我还没有找到它。我想出了这个自制的解决方案:

def zip_equal(it1, it2):
    exhausted = False
    while True:
        try:
            el1 = next(it1)
            if exhausted: # in a previous iteration it2 was exhausted but it1 still has elements
                raise ValueError("it1 and it2 have different lengths")
        except StopIteration:
            exhausted = True
            # it2 must be exhausted too.
        try:
            el2 = next(it2)
            # here it2 is not exhausted.
            if exhausted:  # it1 was exhausted => raise
                raise ValueError("it1 and it2 have different lengths")
        except StopIteration:
            # here it2 is exhausted
            if not exhausted:
                # but it1 was not exhausted => raise
                raise ValueError("it1 and it2 have different lengths")
            exhausted = True
        if not exhausted:
            yield (el1, el2)
        else:
            return

可以使用以下代码测试解决方案:

it1 = (x for x in ['a', 'b', 'c'])  # it1 has length 3
it2 = (x for x in [0, 1, 2, 3])     # it2 has length 4
list(zip_equal(it1, it2))           # len(it1) < len(it2) => raise
it1 = (x for x in ['a', 'b', 'c'])  # it1 has length 3
it2 = (x for x in [0, 1, 2, 3])     # it2 has length 4
list(zip_equal(it2, it1))           # len(it2) > len(it1) => raise
it1 = (x for x in ['a', 'b', 'c', 'd'])  # it1 has length 4
it2 = (x for x in [0, 1, 2, 3])          # it2 has length 4
list(zip_equal(it1, it2))                # like zip (or izip in python2)

我是否忽略了任何其他解决方案?是否有更简单的zip_equal函数实现?

PS:我在Python 3中编写了这个问题,但也欢迎使用Python 2解决方案。

更新

而Martin Peters&#39; answer更简单(这就是我想要的),如果你需要表现,你可能需要检查cjerdonek answer,因为它更快。

5 个答案:

答案 0 :(得分:17)

我可以想到一个更简单的解决方案,如果用于填充较短迭代的sentinel值存在于生成的元组中,则使用itertools.zip_longest()并引发异常:

from itertools import zip_longest

def zip_equal(*iterables):
    sentinel = object()
    for combo in zip_longest(*iterables, fillvalue=sentinel):
        if sentinel in combo:
            raise ValueError('Iterables have different lengths')
        yield combo

不幸的是,我们不能zip()使用yield from来避免每次迭代都带有测试的Python代码循环;一旦最短的迭代器耗尽,zip()将推进所有先前的迭代器,从而吞下证据,如果那些中只有一个额外的项目。

答案 1 :(得分:6)

PEP 618中的内置strict函数引入了可选的布尔关键字参数zip

引用What’s New In Python 3.10

zip()函数现在具有一个可选的strict标志,用于要求所有可迭代对象都具有相等的长度。

启用后,如果其中一个参数在其他参数之前用尽,则会引发ValueError

答案 2 :(得分:4)

这是一种不需要对迭代的每个循环进行任何额外检查的方法。特别是对于长迭代,这可能是理想的。

我们的想法是在每个iterable中填入一个“值”,在到达时引发异常,然后仅在最后进行所需的验证。该方法使用zip()itertools.chain()

以下代码是为Python 3.5编写的。

import itertools

class ExhaustedError(Exception):
    def __init__(self, index):
        """The index is the 0-based index of the exhausted iterable."""
        self.index = index

def raising_iter(i):
    """Return an iterator that raises an ExhaustedError."""
    raise ExhaustedError(i)
    yield

def terminate_iter(i, iterable):
    """Return an iterator that raises an ExhaustedError at the end."""
    return itertools.chain(iterable, raising_iter(i))

def zip_equal(*iterables):
    iterators = [terminate_iter(*args) for args in enumerate(iterables)]
    try:
        yield from zip(*iterators)
    except ExhaustedError as exc:
        index = exc.index
        if index > 0:
            raise RuntimeError('iterable {} exhausted first'.format(index)) from None
        # Check that all other iterators are also exhausted.
        for i, iterator in enumerate(iterators[1:], start=1):
            try:
                next(iterator)
            except ExhaustedError:
                pass
            else:
                raise RuntimeError('iterable {} is longer'.format(i)) from None

以下是使用它的样子。

>>> list(zip_equal([1, 2], [3, 4], [5, 6]))
[(1, 3, 5), (2, 4, 6)]

>>> list(zip_equal([1, 2], [3], [4]))
RuntimeError: iterable 1 exhausted first

>>> list(zip_equal([1], [2, 3], [4]))
RuntimeError: iterable 1 is longer

>>> list(zip_equal([1], [2], [3, 4]))
RuntimeError: iterable 2 is longer

答案 3 :(得分:2)

使用more_itertools.zip_equal(v8.3.0 +):

代码

import more_itertools as mit

演示

list(mit.zip_equal(range(3), "abc"))
# [(0, 'a'), (1, 'b'), (2, 'c')]

list(mit.zip_equal(range(3), "abcd"))
# UnequalIterablesError

more_itertools是通过λ pip install more_itertools

安装的第三方软件包

答案 4 :(得分:1)

我想出了一个使用Sentinel iterable FYI的解决方案:

class _SentinelException(Exception):
    def __iter__(self):
        raise _SentinelException


def zip_equal(iterable1, iterable2):
    i1 = iter(itertools.chain(iterable1, _SentinelException()))
    i2 = iter(iterable2)
    try:
        while True:
            yield (next(i1), next(i2))
    except _SentinelException:  # i1 reaches end
        try:
            next(i2)  # check whether i2 reaches end
        except StopIteration:
            pass
        else:
            raise ValueError('the second iterable is longer than the first one')
    except StopIteration: # i2 reaches end, as next(i1) has already been called, i1's length is bigger than i2
        raise ValueError('the first iterable is longger the second one.')