在多处理中使用共享列表的正确方法是什么

时间:2019-11-19 06:19:13

标签: python multiprocessing python-3.7

我借助Manager, Lock的多处理功能在Python(版本3.7)中实现了 SharedList 。我已将其用作使用多处理Process函数调用创建的进程之间的共享对象。共享列表用于存储共享每个进程所生成的值/对象。

SharedList 的实现与Python Manager的{​​{1}}和Lock

multiprocessing

使用已创建的 SharedList 来存储使用class SharedList(object): def __init__(self, limit): self.manager = Manager() self.results = self.manager.list([]) self.lock = Lock() self.limit = limit def append(self, new_value): with self.lock: if len(self.results) == self.limit: return False self.results.append(new_value) return True def list(self): with self.lock: return list(self.results).copy()

创建的多个进程的值
multiprocessing

results = SharedList(limit) num_processes = min(process_count, limit) processes = [] for i in range(num_processes): new_process = Process(target=child_function, args=(results)) processes.append(new_process) new_process.start() for _process in processes: _process.join() for _process in processes: _process.close() 的实现

child_function

在某些情况下,该实现方式有效,但是当我增加限制时,挂断了。 我使用的处理器数量少于CPU数量,并且相同的实验仍然挂在相同的位置。

是否有更好的方法来解决上述问题,我已经研究了不同的方法,例如使用Queue,但这无法按预期工作,挂断电话?

使用队列添加了以前的实现

使用队列实施

while True:
  result = func()
  if not (results.append(result)):
     break

results_out = [] manager = multiprocessing.Manager() results = manager.Queue() tasks = manager.Queue() num_processes = min(process_count, limit) processes = [] for i in range(num_processes): new_process = multiprocessing.Process(target=child_function, args=(tasks, results) processes.append(new_process) new_process.start() sleep(5) for i in range(limit): tasks.put(0) sleep(1) for i in range(num_processes): tasks.put(-1) num_finished_processes = 0 while True: new_result = results.get() if new_result == -1: num_finished_processes += 1 if num_finished_processes == num_processes: break else: results_out.append(new_result) for process in processes: process.join() for process in processes: process.close()

child_function

已更新

在发布此问题之前,我已经阅读了以下参考资料,但是我无法获得所需的输出。我同意,这段代码导致了死锁状态,但是我无法在python中使用多处理来找到没有死锁的实现

参考

  1. Multiprocessing of shared list

  2. https://pymotw.com/2/multiprocessing/basics.html

  3. Shared variable in python's multiprocessing

  4. https://eli.thegreenplace.net/2012/01/04/shared-counter-with-pythons-multiprocessing

  5. https://medium.com/@urban_institute/using-multiprocessing-to-make-python-code-faster-23ea5ef996ba

  6. http://kmdouglass.github.io/posts/learning-pythons-multiprocessing-module/

  7. python multiprocessing/threading cleanup

根据建议,我可以使用while True: task_val = tasks.get() if task_val < 0: results.put(-1) break else: result = func() results.put(result)

修改 SharedList
Queue

此实现效果很好,但对实现进行了以下更改

class SharedList(object):
    def __init__(self, limit):
        self.manager = Manager()
        self.tasks = self.manager.Queue()
        self.results = self.manager.Queue()
        self.limit = limit
        self.no_of_process = min(process_count, limit)

    def setup(self):
        sleep(1)
        for i in range(self.limit):
            self.tasks.put(0)
        sleep(1)
        for i in range(self.no_of_process):
            self.tasks.put(-1)

    def append(self, new_value):
        task_val = self.tasks.get()
        if task_val < 0:
            self.results.put(-1)
            return False
        else:
            self.results.put(new_value)
            return True

    def list(self):
        results_out = []
        num_finished_processes = 0
        while True:
            new_result = self.results.get()
            if new_result == -1:
                num_finished_processes += 1
                if num_finished_processes == self.no_of_process:
                    break
            else:
                results_out.append(new_result)
        return results_out

results = SharedList(limit) num_processes = min(process_count, limit) processes = [] for i in range(num_processes): new_process = Process(target=child_function, args=(results)) processes.append(new_process) new_process.start() results.setup() for _process in processes: _process.join() for _process in processes: _process.close() 的实现

child_function

但是,仍然如此,在一些迭代之后,它再次陷入死锁,并在此挂起

1 个答案:

答案 0 :(得分:1)

我发现以下基于 Ray 的文章,听起来很有趣,而且易于实现并行计算,既高效又省时

https://towardsdatascience.com/modern-parallel-and-distributed-python-a-quick-tutorial-on-ray-99f8d70369b8