为什么队列不是多处理安全的?

时间:2015-02-23 23:50:38

标签: python python-multiprocessing

我创建了一些多处理代码 - 检测问题非常简单,我发现了一些问题 - 队列没有通过同步更新。

# coding=utf-8
import multiprocessing

def do_work(input_queue, output_queue):
  print multiprocessing.current_process().name
  input_queue.put(1)
  while not input_queue.empty():
    output_queue.put(input_queue.get() + 1)

def main():
  input_queue = multiprocessing.Queue()
  output_queue = multiprocessing.Queue()
  for i in range(8):
    input_queue.put(i)

  processes = []
  for i in range(2):
    process = multiprocessing.Process(name = str(i),
                                      target = do_work,
                                      args = (input_queue,
                                              output_queue), )
    processes.append(process)
    process.start()
  for process in processes:
    process.join()
  results = []
  while not output_queue.empty():
    results.append(output_queue.get())
  print len(results), results

if __name__ == '__main__':
  main()

有时结果是 - 看起来不错:

process 0
process 1
10 [2, 1, 3, 4, 6, 5, 8, 7, 2, 2]

但有时候结果会有所不同,就像价值1没有放在流程开始时那样:

process 0
process 1
9 [1, 2, 3, 4, 5, 6, 7, 8, 2]

看起来打印没问题,因为它是在主线程中完成的,但是队列不支持进程间锁定。你能提出一些建议吗?

1 个答案:

答案 0 :(得分:2)

以下是您的代码略有变化:

# coding=utf-8
import multiprocessing

def do_work(input_queue, output_queue, lock):
  with lock:
    input_queue.put(1)
    print input_queue.empty(), input_queue.qsize()
    while not input_queue.empty():
      output_queue.put(input_queue.get() + 1)

def main():
  input_queue = multiprocessing.Queue()
  output_queue = multiprocessing.Queue()
  lock = multiprocessing.Lock()
  for i in range(8):
    input_queue.put(i)

  processes = []
  for i in range(2):
    process = multiprocessing.Process(name = str(i),
                                      target = do_work,
                                      args = (input_queue,
                                              output_queue, lock), )
    processes.append(process)
    process.start()
  for process in processes:
    process.join()
  results = []
  while not output_queue.empty():
    results.append(output_queue.get())
  print len(results), results

if __name__ == '__main__':
  main()

请注意,现在整个进程都处于锁定状态,因此不存在竞争条件,并且还会打印输入队列的大小以及是否为空。现在这是其中一个运行的输出:

False 9
True 1
9 [1, 2, 3, 4, 5, 6, 7, 8, 2]

注意第二个进程如何表示队列为空,但同时有一个元素。原因在于文档:

  

empty()如果​​队列为空则返回True,否则返回False。因为   多线程/多处理语义,不可靠

要解决此问题,您可以将条件while not input_queue.empty()替换为while input_queue.qsize() > 0。当你这样做时,你会看到你的代码挂起。这是有道理的,因为你首先检查队列的大小,然后尝试弹出它。请考虑以下情形:队列中有一个元素,两个线程都看到了,并尝试弹出。一个成功,另一个现在尝试从空队列弹出,并阻止。要解决此问题,请尝试执行非阻塞弹出,并在失败时重试:

# coding=utf-8
import multiprocessing
import Queue

def do_work(input_queue, output_queue):
  input_queue.put(1)
  while input_queue.qsize() > 0:
    try:
      output_queue.put(input_queue.get(False) + 1)
    except Queue.Empty:
      pass

def main():
  input_queue = multiprocessing.Queue()
  output_queue = multiprocessing.Queue()
  for i in range(8):
    input_queue.put(i)

  processes = []
  for i in range(2):
    process = multiprocessing.Process(name = str(i),
                                      target = do_work,
                                      args = (input_queue,
                                              output_queue) )
    processes.append(process)
    process.start()
  for process in processes:
    process.join()
  results = []
  while True:
    try:
      results.append(output_queue.get(False))
    except Queue.Empty:
      break
  print len(results), results

if __name__ == '__main__':
  main()