Question

我正在寻找一个可靠的实现，以便我可以使用Queue逐步完成项目列表。

我的想法是，我想使用一定数量的工作人员，这些工作人员将通过20多个数据库密集型任务列表并返回结果。我希望Python从五个第一项开始，一旦完成一项任务就开始在队列中的下一个任务。

这就是我目前在没有Threading的情况下这样做。

for key, v in self.sources.iteritems():
    # Do Stuff

我希望有类似的方法，但可能无需将列表拆分为五个子组。这样它就会自动获取列表中的下一个项目。目标是确保如果一个数据库正在减慢进程，它将不会对整个应用程序产生负面影响。

Answer 1

您可以自己实现，但Python 3已经附带了基于Executor的线程管理解决方案，您可以通过安装the backported version在Python 2.x中使用它。

您的代码可能看起来像

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    future_to_key = {}
    for key, value in sources.items():
        future_to_idday[executor.submit(do_stuff, value)] = key
    for future in concurrent.futures.as_completed(future_to_key):
        key = future_to_key[future]
        result = future.result()
        # process result

Answer 2

如果您使用的是python3，我建议使用并发期货模块。如果您没有使用python3并且没有附加到线程（而不是进程），那么您可能会尝试多处理.Pool（虽然它附带了一些警告，但我在应用程序中没有正确关闭池时遇到问题）。如果你必须使用线程，在python2中，你可能最终自己编写代码 - 产生5个运行消费者函数的线程，并且只是迭代地将调用（函数+ args）推送到队列中以供消费者查找和处理它们。

Answer 3

你只能使用stdlib：

#!/usr/bin/env python
from multiprocessing.dummy import Pool # use threads

def db_task(key_value):
    try:
        key, value = key_value
        # compute result..
        return result, None
    except Exception as e:
        return None, e

def main():
    pool = Pool(5)
    for result, error in pool.imap_unordered(db_task, sources.items()):
        if error is None:
            print(result)

if __name__=="__main__":
    main()

一旦队列可用，就逐步拾取项目

3 个答案: