多处理中的外部vs内部__main__变量定义

时间:2015-09-03 00:31:19

标签: python multithreading multiprocessing python-multiprocessing

我有以下代码:

import multiprocessing
import time
import os

# WHEN SEMAPHORE IS DEFINED HERE THEN IT IT WORKS
semaphore = multiprocessing.Semaphore(1)

def producer(num, output):
  semaphore.acquire()
  time.sleep(1)
  element = "PROCESS: %d PID: %d PPID: %d" % (num, os.getpid(), os.getppid())
  print "WRITE -> " + element
  output.put(element)
  time.sleep(1)
  semaphore.release()

if __name__ == '__main__':
    """
    Reads elements as soon as they are are put inside queue
    """

    output    = multiprocessing.Manager().Queue()
    pool      = multiprocessing.Pool(4)
    lst       = range(40)

    # WHEN SEMAPHORE IS DEFINED HERE THEN IT DOES NOT WORKS
    # semaphore = multiprocessing.Semaphore(1)

    for i in lst:
        pool.apply_async(producer, (i, output))
        # print "%d Do not wait!" % i
        # res.get()

    counter = 0
    while True:
      try:
        print "READ  <- " + output.get_nowait()
        counter += 1
        if (counter == len(lst)):
          print "Break"
          break
      except:
        print "READ  <- NOTHING IN BUFFER"  
        pass
      time.sleep(1)

此代码按预期工作,并打印:

READ  <- NOTHING IN BUFFER
WRITE -> PROCESS: 0 PID: 15803 PPID: 15798
READ  <- NOTHING IN BUFFER
READ  <- PROCESS: 0 PID: 15803 PPID: 15798
READ  <- NOTHING IN BUFFER
WRITE -> PROCESS: 1 PID: 15806 PPID: 15798
READ  <- PROCESS: 1 PID: 15806 PPID: 15798
...

然后我有这个版本没有用(除了信号量的定义在另一个地方之外,它与第一个基本相同):

import multiprocessing
import time
import os

# WHEN SEMAPHORE IS DEFINED HERE THEN IT IT WORKS
# semaphore = multiprocessing.Semaphore(1)

def producer(num, output):
  print hex(id(semaphore))
  semaphore.acquire()
  time.sleep(1)
  element = "PROCESS: %d PID: %d PPID: %d" % (num, os.getpid(), os.getppid())
  print "WRITE -> " + element
  output.put(element)
  time.sleep(1)
  semaphore.release()

if __name__ == '__main__':
    """
    Reads elements as soon as they are are put inside queue
    """

    output    = multiprocessing.Manager().Queue()
    pool      = multiprocessing.Pool(4)
    lst       = range(40)

    # WHEN SEMAPHORE IS DEFINED HERE THEN IT DOES NOT WORKS
    semaphore = multiprocessing.Semaphore(1)

    for i in lst:
        pool.apply_async(producer, (i, output))
        # print "%d Do not wait!" % i
        # res.get()

    counter = 0
    while True:
      try:
        print "READ  <- " + output.get_nowait()
        counter += 1
        if (counter == len(lst)):
          print "Break"
          break
      except:
        print "READ  <- NOTHING IN BUFFER"  
        pass
      time.sleep(1)

此版本打印:

READ  <- NOTHING IN BUFFER
READ  <- NOTHING IN BUFFER
READ  <- NOTHING IN BUFFER
READ  <- NOTHING IN BUFFER
READ  <- NOTHING IN BUFFER
READ  <- NOTHING IN BUFFER
READ  <- NOTHING IN BUFFER
...

似乎producer从未向Queue写入任何内容。我读过apply_sync不打印错误消息的地方。所以我在第二段代码中将pool.apply_async(producer, (i, output))更改为pool.apply(producer, (i, output)),看看发生了什么。似乎semaphore未定义,这是输出:

Traceback (most recent call last):
  File "glob_var_wrong.py", line 31, in <module>
    pool.apply(producer, (i, output))
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 244, in apply
    return self.apply_async(func, args, kwds).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
    raise self._value
NameError: global name 'semaphore' is not defined

但是,以下代码正确运行并打印10(__main__中定义的值):

global_var = 20

def print_global_var():
    print global_var

if __name__ == '__main__':
    global_var = 10
    print_global_var()

似乎在这段代码中,全局变量可以在__main__内定义,而在之前的代码中则不可能。首先,我假设__main__中定义的变量不在流程之间共享,但它只影响semaphore而不影响outputpoollst。为什么会这样?

1 个答案:

答案 0 :(得分:2)

当您使用Multiprocessing.Process创建新流程(由Pool引用时,它会复制本地范围,对其进行腌制,并将其发送到新流程进行评估。

因为在调用semaphore之前没有定义变量Pool(4),所以变量是未定义的(在那些评估代码的OTHER进程中),函数producer将抛出异常

要查看此内容,请更改定义

def producer(num, output):
    print hex(id(semaphore))
    try:
        semaphore.acquire()
    except Exception as e:
        print e
    time.sleep(1)
    element = "PROCESS: %d PID: %d PPID: %d" % (num, os.getpid(), os.getppid())
    print "WRITE -> " + element
    output.put(element)
    time.sleep(1)
    semaphore.release()

现在您的失败代码将打印出一堆看起来像

的错误(40)
global name 'semaphore' is not defined

这就是为什么必须在调用Pool

之前定义信号量的原因