弄清楚是否从没有主保护的功能调用

时间:2016-09-13 11:15:01

标签: python windows multithreading

如果从没有主保护(if __name__ == '__main__':)的脚本导入模块,则在模块中的某些功能中执行任何类型的并行操作将导致Windows上出现无限循环。每个新进程加载所有源,现在__name__不等于'__main__',然后继续并行执行。如果没有主要防守,我们将在每个新进程中再次调用相同的函数,产生更多进程,直到崩溃为止。它只是Windows上的一个问题,但脚本也在osx和linux上执行。

我可以通过写入磁盘上的特殊文件来检查这一点,并从中读取以查看我们是否已经启动,但这限制了我们一次运行的单个python脚本。修改所有调用代码以添加主要警卫的简单解决方案是不可行的,因为它们分布在我无法访问的许多存储库中。因此,我想在使用主要防护时进行并行化,但是当它们不使用时,我会回退到单线程执行。

如何判断我是否因为缺少主保护而在导入循环中被调用,以便我可以回退到单线程执行?

以下是一些演示代码:

lib并行代码:

from multiprocessing import Pool


def _noop(x):
    return x


def foo():
    p = Pool(2)
    print(p.map(_noop, [1, 2, 3]))

良好的进口商(有警卫):

from lib import foo

if __name__ == "__main__":
    foo()

糟糕的进口商(没有警卫):

from lib import foo

foo()

错误的导入程序一次又一次地使用此RuntimeError失败:

    p = Pool(2)
  File "C:\Users\filip.haglund\AppData\Local\Programs\Python\Python35\lib\multiprocessing\context.py", line 118, in Pool
    context=self.get_context())
  File "C:\Users\filip.haglund\AppData\Local\Programs\Python\Python35\lib\multiprocessing\pool.py", line 168, in __init__
    self._repopulate_pool()
  File "C:\Users\filip.haglund\AppData\Local\Programs\Python\Python35\lib\multiprocessing\pool.py", line 233, in _repopulate_pool
    w.start()
  File "C:\Users\filip.haglund\AppData\Local\Programs\Python\Python35\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\Users\filip.haglund\AppData\Local\Programs\Python\Python35\lib\multiprocessing\context.py", line 313, in _Popen
    return Popen(process_obj)
  File "C:\Users\filip.haglund\AppData\Local\Programs\Python\Python35\lib\multiprocessing\popen_spawn_win32.py", line 34, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\filip.haglund\AppData\Local\Programs\Python\Python35\lib\multiprocessing\spawn.py", line 144, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\filip.haglund\AppData\Local\Programs\Python\Python35\lib\multiprocessing\spawn.py", line 137, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

1 个答案:

答案 0 :(得分:1)

由于您正在使用multiprocessing,因此您还可以使用它来检测您是主要流程还是子流程。但是,这些功能没有记录,因此只是可能在python版本之间没有警告的情况下更改的实现细节。

每个流程都有name_identity_parent_pid。您可以查看其中任何一个,看看您是否在主要流程中。主要流程name将为'MainProcess'_identity将为()_parent_pid将为None

我的解决方案允许您继续使用multiprocessing,但只是修改子进程,以便他们不会永远创建子进程。它使用装饰器将foo更改为子进程中的无操作,但在主进程中返回foo不变。这意味着当生成的子进程尝试执行foo时,不会发生任何事情(好像它已在__main__后卫中执行。

from multiprocessing import Pool
from multiprocessing.process import current_process

def run_in_main_only(func):
    if current_process().name == "MainProcess":
        return func
    else:
        def noop(*args, **kwargs):
            pass
        return noop

def _noop(_ignored):
    p = current_process()
    return p.name, p._identity, p._parent_pid

@run_in_main_only
def foo():
    with Pool(2) as p:
        for result in p.map(_noop, [1, 2, 3]):
            print(result) # prints something like ('SpawnPoolWorker-2', (2,), 10720)

if __name__ == "__main__":
    print(_noop(1)) # prints ('MainProcess', (), None)