我没有使用Joblib获得预期的输出格式(字典)。也许是因为我不知道如何编写代码来从Joblib中受益。
当我没有Joblib时,我会得到期望的结果。但是使用Joblib,我无法获得正确格式的结果。
备注:由于数据帧很大,我使用Elizabeth Santorella的Groupby类(比pandas groupby快(http://esantorella.com/2016/06/16/groupby/)。
预先感谢您的帮助。
最诚挚的问候
# Point 1 (without joblib) works
grouped = Groupby(df.index)
d = {col: grouped.apply(sum, df[col], broadcast=False) for col in col1}
#Point 1: OK (and it is what is expected)
{'gp_0': array([0., 0., 0., ..., 0., 0., 0.]),
'gp_1': array([0., 0., 0., ..., 0., 0., 0.]),
'gp_2': array([0., 0., 0., ..., 0., 0., 0.])}
# Point 2 doesn't work (output format list of arrays)
from joblib import Parallel, delayed
d = {}
grouped = Groupby(new.index)
d = Parallel(n_jobs=1, verbose=10)(delayed(grouped.apply)(sum, new[col], broadcast=False) for col in col1)
# Point2: Not OK
[array([0., 0., 0., ..., 0., 0., 0.]),
array([0., 0., 0., ..., 0., 0., 0.]),
array([0., 0., 0., ..., 0., 0., 0.])]
# Point 3 doesn't work (output format list of dictionaries)
d = {}
grouped = Groupby(new.index)
def my_func(col):
return {col: grouped.apply(sum, new[col], broadcast=False)}
d = Parallel(n_jobs=1, verbose=10)(delayed(my_func)(col) for col in col1)
#Point 3: not OK
[{'gp_0': array([0., 0., 0., ..., 0., 0., 0.])},
{'gp_1': array([0., 0., 0., ..., 0., 0., 0.])},
{'gp_2': array([0., 0., 0., ..., 0., 0., 0.])}]