我一直在尝试并行化类方法中的进程。当我尝试使用Pool()
中的multiprocessing
时,会出现酸洗错误。当我使用Pool()
中的multiprocessing.dummy
时,我的执行要比序列化的执行慢。
我已尝试使用Stackoverflow帖子作为指导,对下面的代码进行几种变体,但是,没有一种方法能成功解决上述问题。
一个例子:如果将process_function
移到类定义上方(对其进行全局化),则该操作无效,因为我无法访问对象属性。
无论如何,我的代码类似于:
from multiprocessing.dummy import Pool as ThreadPool
from my_other_module import other_module_class
class myClass:
def __init__(self, some_list, number_iterations):
self.my_interface = other_module_class
self.relevant_list = []
self.some_list = some_list
self.number_iterations = number_iterations
# self.other_attributes = stuff from import statements
def load_relevant_data:
self.relevant_list = self.interface.other_function
def compute_foo(self, relevant_list_member_value):
# math involving class attributes
return foo_scalar
def higher_function(self):
self.relevant_list = self.load_relevant_data
np.random.seed(0)
pool = ThreadPool() # I've tried different args here, no help
pool.map(self.process_function, self.relevant_list)
def process_function(self, dict_from_relevant_list):
foo_bar = self.compute_foo(dict_from_relevant_list['key'])
a = 0
for i in some_other_list:
# do other stuff involving class attributes and foo_bar
# a = some of that
dict_from_relevant_list['other_key'] = a
if __name__ == '__main__':
import time
import pprint as pp
some_list = blah
number_of_iterations = 10**4
my_obj = myClass(some_list, number_of_iterations
my_obj.load_third_parties()
start = time.time()
my_obj.higher_function()
execution_time = time.time() - start
print()
print("Execution time for %s simulation runs: %s" % (number_of_iterations, execution_time))
print()
pp.pprint(my_obj.relevant_list[0:5])
我在相关列表中有几百本字典。我只想在我最内层的循环中,从计算量大的模拟中填充每个字典的'other_key'
字段,这会产生标量值,例如上面的a
。似乎应该有一种简单的方法来执行此操作,因为在Matlab中我可以正确地parfor
,并且它是自动完成的。也许这种本能对Python是错误的。