Question

我遇到python多处理队列的问题。我正在对一些数据进行一些硬计算。我创建了一些流程来减少计算时间，在将数据发送到流程之前，数据已经均匀分割。它很好地减少了计算的时间，但是当我想通过多处理从过程返回数据时.Queue需要很长时间，整个事情比在主线程中计算慢。

    processes = []
    proc = 8
    for i in range(proc):
           processes.append(multiprocessing.Process(target=self.calculateTriangles, args=(inData[i],outData,timer)))
    for p in processes:
        p.start()
    results = []
    for i in range(proc):
        results.append(outData.get())
    print("killing threads")
    print(datetime.datetime.now() - timer)
    for p in processes:
        p.join()
    print("Finish Threads")
    print(datetime.datetime.now() - timer)

所有线程在完成后打印完成时间。以下是此代码的示例输出

0:00:00.017873 CalcDone    
0:00:01.692940 CalcDone
0:00:01.777674 CalcDone
0:00:01.780019 CalcDone
0:00:01.796739 CalcDone
0:00:01.831723 CalcDone
0:00:01.842356 CalcDone
0:00:01.868633 CalcDone
0:00:05.497160 killing threads
60968 calculated triangles

在此代码之前，您可以看到一切都很简单。

    for i in range(proc):
        results.append(outData.get())
    print("killing threads")
    print(datetime.datetime.now() - timer)

以下是我在计算机上做的一些观察和较慢的观察。 https://docs.google.com/spreadsheets/d/1_8LovX0eSgvNW63-xh8L9-uylAVlzY4VSPUQ1yP2F9A/edit?usp=sharing。如你所见，在较慢的一个上没有任何改善。

为什么在进程完成时从队列中获取项目需要这么多时间？有办法加快速度吗？

Answer 1

所以我自己解决了。计算速度很快，但是通过一个过程将对象压缩到另一个过程，就像年龄一样。我刚刚创建了一个清除对象中所有不必要字段的方法，也使用管道比多处理队列更快。它将我较慢的计算机上的时间缩短到15秒（从29秒开始）。

Answer 2

这段时间主要用于将另一个对象放入队列并增加信号量计数。如果您能够一次将所有数据批量插入队列，则可以减少到上次的1/10。

我已基于旧方法为Queue动态分配了一种新方法。转到适用于您的Python版本的多处理模块：

/usr/lib/pythonx.x/multiprocessing.queues.py

将类的“ put”方法复制到您的项目中，例如对于Python 3.7：

def put(self, obj, block=True, timeout=None):
    assert not self._closed, "Queue {0!r} has been closed".format(self)
    if not self._sem.acquire(block, timeout):
        raise Full

    with self._notempty:
        if self._thread is None:
            self._start_thread()
        self._buffer.append(obj)
        self._notempty.notify()

修改它：

def put_bla(self, obj, block=True, timeout=None):
    assert not self._closed, "Queue {0!r} has been closed".format(self)

    for el in obj:
        if not self._sem.acquire(block, timeout):  #spike the semaphore count
            raise Full
        with self._notempty:
            if self._thread is None:
                self._start_thread()
            self._buffer += obj  # adding a collections.deque object
            self._notempty.notify()

最后一步是将新方法添加到类中。 multiprocessing.Queue是一个DefaultContext方法，它返回一个Queue对象。将方法直接注入到创建的对象的类中比较容易。所以：

from collections import deque

queue = Queue()
queue.__class__.put_bulk = put_bla  # injecting new method
items = (500, 400, 450, 350) * count
queue.put_bulk(deque(items))

不幸的是，multiprocessing.Pool总是快10％，所以如果您不需要持久的工人来处理任务，那就坚持下去。它是基于multiprocessing.SimpleQueue的，它是基于multiprocessing.Pipe的，我不知道为什么它会更快，因为我的SimpleQueue解决方案不是，并且不能批量注入：)打破这一点，您将拥有有史以来最快的工作人员：）

Python多处理队列速度慢

2 个答案: