如果迭代的长度不相等,我正在寻找一种很好的方法来zip
几次迭代引发异常。
如果迭代物是列表或具有len
方法,则此解决方案简洁明了:
def zip_equal(it1, it2):
if len(it1) != len(it2):
raise ValueError("Lengths of iterables are different")
return zip(it1, it2)
但是,如果it1
和it2
是生成器,则前一个函数将失败,因为未定义长度TypeError: object of type 'generator' has no len()
。
我想itertools
模块提供了一种简单的方法来实现它,但到目前为止我还没有找到它。我想出了这个自制的解决方案:
def zip_equal(it1, it2):
exhausted = False
while True:
try:
el1 = next(it1)
if exhausted: # in a previous iteration it2 was exhausted but it1 still has elements
raise ValueError("it1 and it2 have different lengths")
except StopIteration:
exhausted = True
# it2 must be exhausted too.
try:
el2 = next(it2)
# here it2 is not exhausted.
if exhausted: # it1 was exhausted => raise
raise ValueError("it1 and it2 have different lengths")
except StopIteration:
# here it2 is exhausted
if not exhausted:
# but it1 was not exhausted => raise
raise ValueError("it1 and it2 have different lengths")
exhausted = True
if not exhausted:
yield (el1, el2)
else:
return
可以使用以下代码测试解决方案:
it1 = (x for x in ['a', 'b', 'c']) # it1 has length 3
it2 = (x for x in [0, 1, 2, 3]) # it2 has length 4
list(zip_equal(it1, it2)) # len(it1) < len(it2) => raise
it1 = (x for x in ['a', 'b', 'c']) # it1 has length 3
it2 = (x for x in [0, 1, 2, 3]) # it2 has length 4
list(zip_equal(it2, it1)) # len(it2) > len(it1) => raise
it1 = (x for x in ['a', 'b', 'c', 'd']) # it1 has length 4
it2 = (x for x in [0, 1, 2, 3]) # it2 has length 4
list(zip_equal(it1, it2)) # like zip (or izip in python2)
我是否忽略了任何其他解决方案?是否有更简单的zip_equal
函数实现?
PS:我在Python 3中编写了这个问题,但也欢迎使用Python 2解决方案。
更新
而Martin Peters&#39; answer更简单(这就是我想要的),如果你需要表现,你可能需要检查cjerdonek answer,因为它更快。
答案 0 :(得分:17)
我可以想到一个更简单的解决方案,如果用于填充较短迭代的sentinel值存在于生成的元组中,则使用itertools.zip_longest()
并引发异常:
from itertools import zip_longest
def zip_equal(*iterables):
sentinel = object()
for combo in zip_longest(*iterables, fillvalue=sentinel):
if sentinel in combo:
raise ValueError('Iterables have different lengths')
yield combo
不幸的是,我们不能zip()
使用yield from
来避免每次迭代都带有测试的Python代码循环;一旦最短的迭代器耗尽,zip()
将推进所有先前的迭代器,从而吞下证据,如果那些中只有一个额外的项目。
答案 1 :(得分:6)
为PEP 618中的内置strict
函数引入了可选的布尔关键字参数zip
。
zip()函数现在具有一个可选的
strict
标志,用于要求所有可迭代对象都具有相等的长度。
启用后,如果其中一个参数在其他参数之前用尽,则会引发ValueError
。
答案 2 :(得分:4)
这是一种不需要对迭代的每个循环进行任何额外检查的方法。特别是对于长迭代,这可能是理想的。
我们的想法是在每个iterable中填入一个“值”,在到达时引发异常,然后仅在最后进行所需的验证。该方法使用zip()
和itertools.chain()
。
以下代码是为Python 3.5编写的。
import itertools
class ExhaustedError(Exception):
def __init__(self, index):
"""The index is the 0-based index of the exhausted iterable."""
self.index = index
def raising_iter(i):
"""Return an iterator that raises an ExhaustedError."""
raise ExhaustedError(i)
yield
def terminate_iter(i, iterable):
"""Return an iterator that raises an ExhaustedError at the end."""
return itertools.chain(iterable, raising_iter(i))
def zip_equal(*iterables):
iterators = [terminate_iter(*args) for args in enumerate(iterables)]
try:
yield from zip(*iterators)
except ExhaustedError as exc:
index = exc.index
if index > 0:
raise RuntimeError('iterable {} exhausted first'.format(index)) from None
# Check that all other iterators are also exhausted.
for i, iterator in enumerate(iterators[1:], start=1):
try:
next(iterator)
except ExhaustedError:
pass
else:
raise RuntimeError('iterable {} is longer'.format(i)) from None
以下是使用它的样子。
>>> list(zip_equal([1, 2], [3, 4], [5, 6]))
[(1, 3, 5), (2, 4, 6)]
>>> list(zip_equal([1, 2], [3], [4]))
RuntimeError: iterable 1 exhausted first
>>> list(zip_equal([1], [2, 3], [4]))
RuntimeError: iterable 1 is longer
>>> list(zip_equal([1], [2], [3, 4]))
RuntimeError: iterable 2 is longer
答案 3 :(得分:2)
使用more_itertools.zip_equal
(v8.3.0 +):
代码
import more_itertools as mit
演示
list(mit.zip_equal(range(3), "abc"))
# [(0, 'a'), (1, 'b'), (2, 'c')]
list(mit.zip_equal(range(3), "abcd"))
# UnequalIterablesError
more_itertools
是通过λ pip install more_itertools
答案 4 :(得分:1)
我想出了一个使用Sentinel iterable FYI的解决方案:
class _SentinelException(Exception):
def __iter__(self):
raise _SentinelException
def zip_equal(iterable1, iterable2):
i1 = iter(itertools.chain(iterable1, _SentinelException()))
i2 = iter(iterable2)
try:
while True:
yield (next(i1), next(i2))
except _SentinelException: # i1 reaches end
try:
next(i2) # check whether i2 reaches end
except StopIteration:
pass
else:
raise ValueError('the second iterable is longer than the first one')
except StopIteration: # i2 reaches end, as next(i1) has already been called, i1's length is bigger than i2
raise ValueError('the first iterable is longger the second one.')