要并行处理以下代码。由于某些原因,我必须将其子集然后应用函数。请注意,子集大小将不一致。
"InjectOvfEnv": true
答案 0 :(得分:1)
使用multiprocessing尝试以下代码:
import multiprocessing
def f(x):
return x*x
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
if __name__ == '__main__':
n_core = multiprocessing.cpu_count()
p = multiprocessing.Pool(processes= n_core)
data = range(0, 8)
subsets = chunks(data, n_core)
subset_results = []
for subset in subsets:
subset_results.append(p.map(f, subset))
print(subset_results)
对于您而言,可以为您提供帮助的块函数如下:
def chunks_series(s):
subsets = []
for i in range(s.max() + 1):
subset = s[s == i]
subsets.append(subset.values)
return subsets
subsets = chunks_series(df['col1'])
或者您可以在同一循环中完成所有操作:
n_core = multiprocessing.cpu_count()
p = multiprocessing.Pool(processes=n_core)
s = df['col1']
subset_results = []
for i in range(s.max() + 1):
subset = s[s == i]
subset_results.append(p.map(f, subset))
我更愿意引入一个块函数,即使对于您的情况它没有引入优势,也可以使代码更加清晰和通用。