Question

我正在寻找Python的zip和zip_longest函数（来自 itertools 模块）之间的中间地带，它耗尽了所有给定的迭代器，但没有填充任何东西。因此，例如，它应该像这样转置元组：

(11, 12, 13    ),        (11, 21, 31, 41),
(21, 22, 23, 24),  -->   (12, 22, 32, 42),
(31, 32        ),        (13, 23,     43),
(41, 42, 43, 44),        (    24,     44)

_{^{（为更好的图形对齐添加了空格。）}}

我设法通过在fillvalue之后清除zip_longest来构建粗略的解决方案。

def zip_discard(*iterables, sentinel = object()):
    return map(
            partial(filter, partial(is_not, sentinel)), 
            zip_longest(*iterables, fillvalue=sentinel))

有没有办法在不引入哨兵的情况下开始这样做？可以使用yield改进吗？哪种方法效率最高？

Answer 1

你接近是好的。我认为使用哨兵是优雅的。我可能会认为使用嵌套生成器表达式更加pythonic：

def zip_discard_gen(*iterables, sentinel=object()):
    return ((entry for entry in iterable if entry is not sentinel)
            for iterable in zip_longest(*iterables, fillvalue=sentinel))

这需要更少的导入，因为不需要partial()或ne()。

它也快一点：

data = [(11, 12, 13    ),
        (21, 22, 23, 24),
        (31, 32        ),
        (41, 42, 43, 44)]

%timeit [list(x) for x in zip_discard(*data)]  
10000 loops, best of 3: 17.5 µs per loop

%timeit [list(x) for x in zip_discard_gen(*data)]
100000 loops, best of 3: 14.2 µs per loop

修改

列表理解版本有点快：

def zip_discard_compr(*iterables, sentinel=object()): return [[entry for entry in iterable if entry is not sentinel] for iterable in zip_longest(*iterables, fillvalue=sentinel)]

定时：

%timeit zip_discard_compr(*data) 100000 loops, best of 3: 6.73 µs per loop

Python 2版本：

from itertools import izip_longest SENTINEL = object() def zip_discard_compr(*iterables): sentinel = SENTINEL return [[entry for entry in iterable if entry is not sentinel] for iterable in izip_longest(*iterables, fillvalue=sentinel)]

计时

此版本返回与zip_varlen相同的数据结构 Tadhg McDonald-Jensen：

def zip_discard_gen(*iterables, sentinel=object()): return (tuple([entry for entry in iterable if entry is not sentinel]) for iterable in zip_longest(*iterables, fillvalue=sentinel))

速度快了两倍：

%timeit list(zip_discard_gen(*data)) 100000 loops, best of 3: 9.37 µs per loop %timeit list(zip_varlen(*data)) 10000 loops, best of 3: 18 µs per loop

Answer 2

zip和zip_longest都设计为始终生成相等长度的元组，您可以使用以下内容定义自己的生成器并不关心len：

def _one_pass(iters):
    for it in iters:
        try:
            yield next(it)
        except StopIteration:
            pass #of some of them are already exhausted then ignore it.

def zip_varlen(*iterables):
    iters = [iter(it) for it in iterables]
    while True: #broken when an empty tuple is given by _one_pass
        val = tuple(_one_pass(iters))
        if val:
            yield val
        else:
            break

如果压缩的数据相当大，那么每次跳过耗尽的迭代器都会很昂贵，从iters函数中_one_pass删除完成的迭代器可能更有效：

def _one_pass(iters):
    i = 0
    while i<len(iters):
        try:
            yield next(iters[i])
        except StopIteration:
            del iters[i]
        else:
            i+=1

这两个版本都不需要创建中间结果或使用临时填充值。

zip_longest没有填充值

2 个答案:

计时