python中的多处理模块并修改共享的全局变量

时间:2017-12-23 11:39:44

标签: python multiprocessing global-variables

我编写了一个小python程序,看看我是否理解全局变量如何传递给“子”进程。

import time
import random

shared_var = range(12)

def f(x):
    global shared_var
    time.sleep(1+random.random())
    shared_var[x] = 100
    print x, multiprocessing.current_process(), shared_var
    return x*x

if __name__ == '__main__':
    pool = multiprocessing.Pool(4)
    results = pool.map(f, range(8))
    print results
    print shared_var

当我运行它时,我得到了

3 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 4, 5, 6, 7, 8, 9, 10, 11]
0 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
2 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 6, 7, 8, 9, 10, 11]
1 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
4 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 100, 5, 6, 7, 8, 9, 10, 11]
5 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 100, 6, 7, 8, 9, 10, 11]
6 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 100, 7, 8, 9, 10, 11]
7 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 6, 100, 8, 9, 10, 11]
[0, 1, 4, 9, 16, 25, 36, 49]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

这是合乎逻辑的,因为子进程修改了全局变量,因此写入时复制机制使得当子进程修改全局变量时,它会被复制,因此任何更改只能在生成的进程中可见

我惊讶的是当我修改代码以打印变量的标识符时:

import multiprocessing
import time
import random

shared_var = range(12)

def f(x):
    global shared_var
    time.sleep(1+random.random())
    shared_var[x] = 100
    print x, multiprocessing.current_process(), shared_var, id(shared_var)
    return x*x

if __name__ == '__main__':
    pool = multiprocessing.Pool(4)
    results = pool.map(f, range(8))
    print results
    print shared_var, id(shared_var)

得到了:

3 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
0 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
1 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
2 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
6 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 100, 7, 8, 9, 10, 11] 4504973968
7 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 6, 100, 8, 9, 10, 11] 4504973968
4 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 100, 5, 6, 7, 8, 9, 10, 11] 4504973968
5 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 100, 6, 7, 8, 9, 10, 11] 4504973968
[0, 1, 4, 9, 16, 25, 36, 49]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968

所有变量的标识符(在主线程和生成的进程中)都是相同的,而我期望每个进程都有一个副本......

有谁知道为什么我得到这些结果?另外一些参考文献multiprocessing如何处理由创建的Process es读取/写入的全局变量将是很好的。谢谢!

2 个答案:

答案 0 :(得分:1)

我认为对记忆存在一些困惑。您不使用多线程,而是使用多处理,因此每个工作程序都在一个单独的进程中运行,具有自己的虚拟内存空间。因此,每个进程从一开始就拥有自己的shared_var副本。这是在每次调用f(x)时修改的内容,使__main__中的实际变量不受影响。

您可以查看the docs有关在流程之间共享内存的章节,例如使用multiprocessing.Array

我不能100%确定为什么地址保持不变,但我认为由于每个新的子进程都是通过分叉主进程并复制其内存布局而产生的,因此虚拟内存中的地址保持不变。每个孩子。物理内存地址当然是不同的。这就是为什么你会看到相同的id但价值不同的原因。

答案 1 :(得分:0)

您可能已经知道CPython中的org.apache.poi.POIXMLDocumentPart实际上正在访问对象的内存地址。

检查支票https://superuser.com/questions/347765/is-virtual-memory-related-to-virtual-address-space-of-a-processWhy Virtual Memory Address is the same in different process?。基本上,n个操作系统为每个进程安排虚拟内存地址,而该进程对对象的实际(物理)内存地址一无所知。