Question

我遇到了执行n个并发事件的问题，这些事件都将迭代器返回到他们获取的结果。但是，有一个可选的limit参数，基本上是为了整合所有迭代器并返回最新的limit结果。

因此，例如：我在8个线程上执行2,000个url请求但只想要前100个结果，但不是所有100个来自同一个潜在线程。

因此，unravel：

import itertools

def unravel(*iterables, with_limit = None):
    make_iter = {a:iter(i) for a,i in enumerate(iterables)}

    if not isinstance(with_limit, int):
        with_limit = -1

    resize = False

    while True:
        for iid, take_from in make_iter.items():
            if with_limit == 0:
                raise StopIteration

            try:
                yield next(take_from)
            except StopIteration:
                resize = iid
            else: 
                with_limit -= 1

        if resize:
            resize = False

            if len(make_iter.keys()) > 1:
                make_iter.pop(resize)

            else: raise StopIteration

用法：

>>> a = [1,2,3,4,5]
>>> b = [6,7,8,9,10]
>>> c = [1,3,5,7]
>>> d = [2,4,6,8]
>>> 
>>> print([e for e in unravel(c, d)])
[1, 2, 3, 4, 5, 6, 7, 8]
>>> print([e for e in unravel(c, d, with_limit = 3)])
[1, 2, 3]
>>> print([e for e in unravel(a, b, with_limit = 6)])
[1, 6, 2, 7, 3, 8]
>>> print([e for e in unravel(a, b, with_limit = 100)])
[1, 6, 2, 7, 3, 8, 4, 9, 5, 10]

这样的事情是否已经存在，或者这是一个不错的实现？

由于

编辑，工作修正

受到@abernert建议的启发，这就是我的目标。谢谢大家！

def unravel(*iterables, limit = None):
    yield from itertools.islice(
            filter(None,
                itertools.chain.from_iterable(
                    itertools.zip_longest(
                        *iterables
                    )
                ) 
            ), limit)



>>> a = [x for x in range(10)]
>>> b = [x for x in range(5)]
>>> c = [x for x in range(0, 20, 2)]
>>> d = [x for x in range(1, 30, 2)]
>>> 
>>> print(list(unravel(a, b)))
[1, 1, 2, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]
>>> print(list(unravel(a, b, limit = 3)))
[1, 1, 2]
>>> print(list(unravel(a, b, c, d, limit = 20)))
[1, 1, 1, 2, 3, 2, 2, 4, 5, 3, 3, 6, 7, 4, 4, 8, 9, 5, 10, 11]

Answer 1

来自itertools示例食谱：

def roundrobin(*iterables):
    "roundrobin('ABC', 'D', 'EF') --> A D E B F C"
    # Recipe credited to George Sakkis
    pending = len(iterables)
    nexts = cycle(iter(it).__next__ for it in iterables)
    while pending:
        try:
            for next in nexts:
                yield next()
        except StopIteration:
            pending -= 1
            nexts = cycle(islice(nexts, pending))

使用itertools.islice强制执行with_limit，例如：

print([e for e in itertools.islice(roundrobin(c, d), 3)])

>>> list(roundrobin(a, b, c, d))
[1, 6, 1, 2, 2, 7, 3, 4, 3, 8, 5, 6, 4, 9, 7, 8, 5, 10]

Answer 2

你在这里做的几乎只是zip。

你想要一个可迭代的平面，而不是一个可迭代的子迭代，但chain修复了它。

并且您只想获取前N个值，但islice修复了该值。

所以，如果长度都相等：

>>> list(chain.from_iterable(zip(a, b)))
[1, 6, 2, 7, 3, 8, 4, 9, 5, 10]
>>> list(islice(chain.from_iterable(zip(a, b)), 7))
[1, 6, 2, 7, 3, 8, 4]

但是如果长度不相等，那么只要第一个可迭代完成，就会停止，这是你不想要的。 stdlib中唯一的替代方法是zip_longest，它使用None填充缺失值。

你可以非常轻松地写一个zip_longest_skipping（在彼得的答案中实际上是round_robin），但你也可以zip_longest并过滤掉结果：< / p>

>>> list(filter(None, chain.from_iterable(zip_longest(a, b, c, d))))
[1, 6, 1, 2, 2, 7, 3, 4, 3, 8, 5, 6, 4, 9, 7, 8, 5, 10]

（显然，如果您的值都是字符串或None，这也不起作用，但当它们都是正整数时，它可以正常工作......来处理＆＃34;或者None＆＃34;案例，执行sentinel=object()，将其传递给zip_longest，然后过滤x is not sentinel。）

Answer 3

对于你真正想要做的事情，可能有一个更好的解决方案。

我在8个线程上执行2,000个url请求但只想要前100个结果，但不是所有100个来自同一个潜在线程。

好的，为什么结果会分为8个单独的迭代？这没有充分的理由。而不是给每个线程自己的队列（或全局列表和锁，或者你正在使用的任何东西），然后尝试将它们压缩在一起，为什么不让它们首先共享一个队列呢？

实际上，这是几乎所有线程池设计的默认方式（包括stdlib中的multiprocessing.Pool和concurrent.futures.Executor）。查看concurrent.futures.ThreadPoolExecutor的主要示例：

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

这几乎就是你的用例 - 垃圾邮件通过5个不同的线程发送大量的URL下载并收集结果，而不会出现问题。

当然它缺少with_limit，但您可以将as_completed迭代包装在islice中来处理它，然后就完成了。

Answer 4

这使用生成器和izip_longest一次从多个迭代器中提取一个项目

from itertools import izip_longest


def  unravel(cap, *iters):

    counter = 0
    for slice in izip_longest(*iters):
        for entry in [s for s in slice if s is not None]:
            yield entry
            counter += 1
            if counter >= cap: break

有没有更好的方法在python中执行“解开”功能？

4 个答案: