如何在保持订单的同时在生成器上使用线程(每个项目多个线程)?

时间:2018-02-15 21:59:44

标签: python multithreading python-2.7 python-multithreading

我有一个模仿REST API调用的代码(见下文)。

对于生成器项目中的每个键,它需要运行REST调用。所以在我的例子中,记录可能是

{"a": 2, "b": 36, "c": 77}

我需要分别为每个密钥(abc)运行REST调用,然后输出结果(这只是否定了数字):

{"a": 2, "a_neg": -2, "b": 36, "b_neg": -36, "c": 77, "c_neg": -77}

现在我的当前代码适用于一个键,但是有多个键,它会重复这些项目(所以我得到3个键的结果的三倍)。

此外还有一些时髦的竞争条件。我想我只能保留最后一条记录,但我对线程并不擅长并担心线程安全或其他高级内容。

以下是输出示例:

{'a': 89, 'a_neg': -89, 'b': 69, 'c': 38}
{'a': 89, 'a_neg': -89, 'b': 69, 'c': 38, 'c_neg': -38}
{'a': 89, 'a_neg': -89, 'b': 69, 'b_neg': -69, 'c': 38, 'c_neg': -38}
{'a': 90, 'a_neg': -90, 'b': 43, 'c': 16}
{'a': 90, 'a_neg': -90, 'b': 43, 'c': 16, 'c_neg': -16}
{'a': 90, 'a_neg': -90, 'b': 43, 'b_neg': -43, 'c': 16, 'c_neg': -16}
{'a': 91, 'a_neg': -91, 'b': 49, 'b_neg': -49, 'c': 77, 'c_neg': -77}
{'a': 91, 'a_neg': -91, 'b': 49, 'b_neg': -49, 'c': 77, 'c_neg': -77}
{'a': 91, 'a_neg': -91, 'b': 49, 'b_neg': -49, 'c': 77, 'c_neg': -77}

最后这是我的源代码(你可以自己运行):

#!/usr/bin/env python

from concurrent.futures import ThreadPoolExecutor
from time import sleep
from pprint import pprint
import random

def records():
    # simulates records generator
    for i in range(100):
        yield {"a": i, "b": random.randint(0,100), "c": random.randint(0,100)}

def stream(records):
    threads = 8
    pool = ThreadPoolExecutor(threads)

    def rest_api_lookup(record_dict):
        # simulates REST call :)
        sleep(0.1)
        key = record_dict["key"]
        record = record_dict["record"]

        record[key + "_neg"] = -record[key]

        return record

    def thread(records):
        chunk = []
        for record in records:
            for key in record:
                chunk.append(pool.submit(rest_api_lookup, {"record": record, "key": key}))

            if len(chunk) == threads:
                yield chunk
                chunk = []

        if chunk:
            yield chunk

    def unchunk(chunk_gen):
        """Flattens a generator of Future chunks into a generator of Future results."""
        for chunk in chunk_gen:
            for f in chunk:
                yield f.result() # get result from Future

    # Now iterate over all results in same order as records
    for result in unchunk(thread(records)):
        #yield result
        pprint(result)

stream(records())

1 个答案:

答案 0 :(得分:1)

这里的第一个问题是你在一个增长的记录中循环键......

for key in list(record):  # make a copy of the keys!

我认为第二个问题是你有3个密钥和8个线程...... len(chunk)将是3, 6, 9 ... threads8 - 以下条件未达成

        if len(chunk) == threads:  # try len(chunk) >= threads
            yield chunk
            chunk = []

最后一个问题是你在所有线程完成之前产生未完成的记录。这是一个可能的解决方案:

def unchunk(chunk_gen):
    """Flattens a generator of Future chunks into a generator of Future results."""
    for chunk in chunk_gen:
        old_res = None
        for f in chunk:
            res = f.result() # get result from Future
            if old_res and res is not old_res:
                yield old_res
            old_res = res
    if old_res:
        yield old_res