具有多线程的迭代器

时间:2017-01-15 10:00:23

标签: python multithreading

我需要实现一个返回两个值的Iterator(到目前为止没有异常),但是这些值需要连续计算/并行生成,即使没有请求迭代器也是如此。

这是一个解释我需要的例子。

def GenerateValues()
    #I do the math for value1 in the first thread
    #I do the math for value2 in the second thread
    return value1 value2

def myIterator()
    while 1:
        yield GenerateValues()

在这种情况下,value1value2仅在调用函数myIterator时并行计算/生成。但在我的问题中,计算/生成value1value2需要很长时间,但是还需要很长时间来处理value1value2。因此,当我的软件正在处理value1value2时,我希望它能并行计算新value1和新value2

所以它会是这样的:

def GenerateValues()
    #If value1 and value 2 are not computed, then wait.

    #I do the math for the new value1 in the first thread without blocking
    #I do the math for the new value2 in the second thread without blocking
    return value1 value2

def myIterator()
    while 1:
        yield GenerateValues()

通过这样的配置,计算/生成新的value1和新的value2,同时返回value1value2进行处理。

  • 这是否足够清楚?
  • 如果是,我该如何进行同步?

提前感谢您的帮助!

PS:我需要while 1,无需评论这一点。

3 个答案:

答案 0 :(得分:2)

不确定我是否完全理解您要尝试的操作,如果您尝试并行计算value1value2,则可以使用multiprocessingthreading 。如果任务是CPU绑定的,我建议multiprocessing,以充分利用您的CPU通过使用子进程而不是线程来“侧向”全局解释器锁(GIL)。

这是使用multiprocessing

的一个相当直接的例子
from multiprocessing import Queue, Process

def cal_value1(queue):
    # do the job
    queue.put({'value1': value1})

def cal_value2(queue):
    # do the job
    queue.put({'value2': value2})

def GenerateValues()
    #If value1 and value 2 are not computed, then wait.
    queue = Queue() # 
    process_1 = Process(target=cal_value1, args=(queue, ))
    process_2 = Process(target=cal_value2, args=(queue, ))
    process_1.start()
    process_2.start()  # start both processes
    process_1.join()
    process_2.join()  # wait for both to finish

    result = queue.get()
    result.update(queue.get()) # get results

    return result['value1'], result['value2']

P.S。如果需要,您可以轻松使用threading.ThreadQueue.Queue来替换multiprocessing.Processmultiprocessing.Queue

修改 现在让我们使cal_value1cal_value2长时间运行的进程,您可能希望在脚本的开头启动这两个进程。

from multiprocessing import Queue, Process

def cal_value1(tasks, results):
    while True:
        task = tasks.get() # this will block until a new task coming in
        # calculate value1
        results.put({'value1': value1})

def cal_value2(tasks, results):
    while True:
        task = tasks.get() # this will block until a new task coming in
        # calculate value2
        results.put({'value2': value2})

def main():
    cal_value1_tasks, cal_value2_tasks, results = Queue(), Queue(), Queue()
    process_1 = Process(target=cal_value1, args=(cal_value1_tasks, results, ))
    process_2 = Process(target=cal_value2, args=(cal_value2_tasks, results, ))
    process_1.start()
    process_2.start()
    cal_value1_tasks.put('cal_value1')
    cal_value2_tasks.put('cal_value2') # Start to calculate the first pair
    values = GenerateValues(cal_value1_tasks, cal_value2_tasks, results)

def GenerateValues(cal_value1_tasks, cal_value2_tasks, results):
    values = results.get() # get results
    values.update(queue.get()) # notice that it'll block until both value1 and value 2 calculated
    cal_value1_tasks.put('cal_value1')
    cal_value2_tasks.put('cal_value2') # before returning, start to calculate the next round of value1 and value2
    return values['value1'], values['value2]

答案 1 :(得分:0)

我不确定我的问题是否正确。另外,我不明白为什么你想要制作一个额外的迭代器而不是直接在ComputeValues中产生。但我会让我的例子贴近你的代码。

如果是这样,也许您想尝试使用multiprocessing.pool,如下所示:

from multiprocessing import Pool

def ComputeValues(v1,v2):
    ...
    return value1 value2 # you could just yield here!

def myIterator(x): #x is a tuple in this case
    v1,v2 = x
    while 1:
        yield ComputeValues(v1,v2)

p = Pool(5) #will spawn 5 processes, each of them will run myIterator
print(p.imap_unordered(myIterator, [(v1a,v2a), (v1b, v2b), ...]))

答案 2 :(得分:0)

在CPython中(你从python.org获得的最常用的实现)线程并没有真正帮助并行化在python 中完成的计算。

因为为了使内存管理更容易(除其他外),一次只有一个线程可以执行Python字节码。这是由全局解释器锁(“GIL”)强制执行的。

(如果你在像numpy这样的扩展中进行所有计算,它会在GIL工作时释放它,这个限制通常不适用)

您可以使用multiprocessing或(或ProcesPoolExecutor来自python 3.2以后的concurrent.futures)将计算分散到多个进程中。实施中都有两个例子。

下面是我使用ProcessPoolExecutor将DICOM图像转换为JPEG的示例。它使用“wand”python绑定到ImageMagick。如何制作一份工作清单(期货),然后开始这些工作。 as_completed函数按照它们完成的顺序返回每个未来的结果。

def convert(filename):
    """Convert a DICOM file to a JPEG file, removing the blank areas from the
    Philips x-ray detector.

    Arguments:
        filename: name of the file to convert.

    Returns:
        Tuple of (input filename, output filename)
    """
    outname = filename.strip() + '.jpg'
    with Image(filename=filename) as img:
        with img.convert('jpg') as converted:
            converted.units = 'pixelsperinch'
            converted.resolution = (300, 300)
            converted.crop(left=232, top=0, width=1574, height=2048)
            converted.save(filename=outname)
    return filename, outname


def main(argv):
    """Main entry point for dicom2jpg.py.

    Arguments:
        argv: command line arguments
    """
    if len(argv) == 1:
        binary = os.path.basename(argv[0])
        print("{} ver. {}".format(binary, __version__), file=sys.stderr)
        print("Usage: {} [file ...]\n".format(binary), file=sys.stderr)
        print(__doc__)
        sys.exit(0)
    del argv[0]  # Remove the name of the script from the arguments.
    es = 'Finished conversion of {} to {}'
    with cf.ProcessPoolExecutor(max_workers=os.cpu_count()) as tp:
        fl = [tp.submit(convert, fn) for fn in argv]
        for fut in cf.as_completed(fl):
            infn, outfn = fut.result()
            print(es.format(infn, outfn))

您可以在github上的scripts存储库中找到此示例和其他示例。