如何在python中并行化此循环?
import pandas as pd
def my_func(tup):
return {tup[0][1]: tup[1]['col3'].sum()}
arr = [['a','c',3],
['b','d',5],
['b','d',6],
['a','b',1],
['a','c',2],
['a','b',4]]
df = pd.DataFrame(arr, columns=['col1', 'col2', 'col3'])
return_dict = {}
for i in df.col1.unique():
return_dict[i] = []
## Need to parallelize this loop
for group in df.groupby(['col1', 'col2']):
return_dict[group[0][0]].append(my_func(group)) #group[0][0] == unique values in col1
print(return_dict)
预期输出:
{'a': [{'b': 5}, {'c': 5}], 'b': [{'d': 11}]}
尝试过this,但是没有group [0] [0]问题,即字典的键不是并行函数的返回值。
我尝试了以下操作,其中我要依次进行col1
值。
import pandas as pd
from joblib import Parallel, delayed
def my_func(tup):
return {tup[0]: tup[1]['col3'].sum()}
arr = [['a','c',3],
['b','d',5],
['b','d',6],
['a','b',1],
['a','c',2],
['a','b',4]]
df = pd.DataFrame(arr, columns=['col1', 'col2', 'col3'])
return_dict = {}
for i in df.col1.unique():
return_dict[i] = Parallel(n_jobs=-1, backend="threading")(
map(delayed(my_func), df[df['col1']==i].groupby('col2'))
)
print(return_dict)
有什么方法可以避免连续进行col1
吗?如果没有,为什么?