使用pool.map
调用方法函数时,我观察到一种非常奇怪的行为。
仅使用一个进程,其行为就与简单的for循环不同,我们在if not self.seeded:
块中输入了几次,但我们不应该这样做。
这是下面的代码和输出:
import os
from multiprocessing import Pool
class MyClass(object):
def __init__(self):
self.seeded = False
print("Constructor of MyClass called")
def f(self, i):
print("f called with", i)
if not self.seeded:
print("PID : {}, id(self.seeded) : {}, self.seeded : {}".format(os.getpid(), id(self.seeded), self.seeded))
self.seeded = True
def multi_call_pool_map(self):
with Pool(processes=1) as pool:
print("multi_call_pool_map with {} processes...".format(pool._processes))
pool.map(self.f, range(10))
def multi_call_for_loop(self):
print("multi_call_for_loop ...")
list_res = []
for i in range(10):
list_res.append(self.f(i))
if __name__ == "__main__":
MyClass().multi_call_pool_map()
输出:
Constructor of MyClass called
multi_call_pool_map with 1 processes...
f called with 0
PID : 18248, id(self.seeded) : 1864747472, self.seeded : False
f called with 1
f called with 2
f called with 3
PID : 18248, id(self.seeded) : 1864747472, self.seeded : False
f called with 4
f called with 5
f called with 6
PID : 18248, id(self.seeded) : 1864747472, self.seeded : False
f called with 7
f called with 8
f called with 9
PID : 18248, id(self.seeded) : 1864747472, self.seeded : False
并使用for循环:
if __name__ == "__main__":
MyClass().multi_call_for_loop()
输出:
Constructor of MyClass called
multi_call_for_loop ...
f called with 0
PID : 15840, id(self.seeded) : 1864747472, self.seeded : False
f called with 1
f called with 2
f called with 3
f called with 4
f called with 5
f called with 6
f called with 7
f called with 8
f called with 9
我们如何解释pool.map的行为(第一种情况)?我不明白为什么要在if块中多次输入,因为self.seeded
仅在构造函数中设置为False
,而构造函数仅被调用一次...
(我有Python 3.6.8)
答案 0 :(得分:3)
在运行代码并在self
中打印f
时,我们可以看到,每次输入if
子句之前,实例实际上都会发生变化:
def f(self, i):
print("f called with", i, "self is",self)
if not self.seeded:
print("PID : {}, id(self.seeded) : {}, self.seeded : {}".format(os.getpid(), id(self.seeded), self.seeded))
self.seeded = True
此输出:
Constructor of MyClass called
multi_call_pool_map with 1 processes...
f called with 0 self is <__main__.MyClass object at 0x7f30cd592b38>
PID : 22879, id(self.seeded) : 10744096, self.seeded : False
f called with 1 self is <__main__.MyClass object at 0x7f30cd592b38>
f called with 2 self is <__main__.MyClass object at 0x7f30cd592b38>
f called with 3 self is <__main__.MyClass object at 0x7f30cd592b00>
PID : 22879, id(self.seeded) : 10744096, self.seeded : False
f called with 4 self is <__main__.MyClass object at 0x7f30cd592b00>
f called with 5 self is <__main__.MyClass object at 0x7f30cd592b00>
f called with 6 self is <__main__.MyClass object at 0x7f30cd592ac8>
PID : 22879, id(self.seeded) : 10744096, self.seeded : False
f called with 7 self is <__main__.MyClass object at 0x7f30cd592ac8>
f called with 8 self is <__main__.MyClass object at 0x7f30cd592ac8>
f called with 9 self is <__main__.MyClass object at 0x7f30cd592a90>
PID : 22879, id(self.seeded) : 10744096, self.seeded : False
如果将chunksize=10
添加到.map()
,它的行为就像for循环:
def multi_call_pool_map(self):
with Pool(processes=1) as pool:
print("multi_call_pool_map with {} processes...".format(pool._processes))
pool.map(self.f, range(10), chunksize=10)
此输出:
Constructor of MyClass called
multi_call_pool_map with 1 processes...
f called with 0 self is <__main__.MyClass object at 0x7fd175093b00>
PID : 22972, id(self.seeded) : 10744096, self.seeded : False
f called with 1 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 2 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 3 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 4 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 5 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 6 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 7 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 8 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 9 self is <__main__.MyClass object at 0x7fd175093b00>
发生这种情况的确切原因是非常详细的实现细节,并且与multiprocessing
如何在同一池中的进程之间共享数据有关。
恐怕我没有足够的资格来确切地回答内部工作的方式和原因。
答案 1 :(得分:1)
当您将实例方法与Pool.map
一起使用时,对象实例的副本将在pickle
模块的帮助下发送到工作进程。您的结果表明map
是如何在块中工作的,并且在每个块的开头从腌制的表单中重新加载了对象实例。加载泡菜不会调用__init__
。
有关更多说明,请参见https://thelaziestprogrammer.com/python/a-multiprocessing-pool-pickle。