使用Joblib无法获得正确的输出格式

时间:2019-05-23 19:05:28

标签: python format joblib

我没有使用Joblib获得预期的输出格式(字典)。也许是因为我不知道如何编写代码来从Joblib中受益。

当我没有Joblib时,我会得到期望的结果。但是使用Joblib,我无法获得正确格式的结果。

备注:由于数据帧很大,我使用Elizabeth Santorella的Groupby类(比pandas groupby快(http://esantorella.com/2016/06/16/groupby/)。

预先感谢您的帮助。

最诚挚的问候

    # Point 1 (without joblib) works
    grouped = Groupby(df.index)
    d = {col: grouped.apply(sum, df[col], broadcast=False) for col in col1} 
    #Point 1: OK (and it is what is expected)
    {'gp_0': array([0., 0., 0., ..., 0., 0., 0.]),
    'gp_1': array([0., 0., 0., ..., 0., 0., 0.]),
    'gp_2': array([0., 0., 0., ..., 0., 0., 0.])}
    # Point 2 doesn't work (output format list of arrays)
    from joblib import Parallel, delayed
    d = {}
    grouped = Groupby(new.index)
    d = Parallel(n_jobs=1, verbose=10)(delayed(grouped.apply)(sum, new[col], broadcast=False) for col in col1)
    # Point2: Not OK
    [array([0., 0., 0., ..., 0., 0., 0.]),
    array([0., 0., 0., ..., 0., 0., 0.]),
    array([0., 0., 0., ..., 0., 0., 0.])]
    # Point 3 doesn't work (output format list of dictionaries)
    d = {}
    grouped = Groupby(new.index)
    def my_func(col):
        return {col: grouped.apply(sum, new[col], broadcast=False)}
    d = Parallel(n_jobs=1, verbose=10)(delayed(my_func)(col) for col in col1)
    #Point 3: not OK
    [{'gp_0': array([0., 0., 0., ..., 0., 0., 0.])},
    {'gp_1': array([0., 0., 0., ..., 0., 0., 0.])},
    {'gp_2': array([0., 0., 0., ..., 0., 0., 0.])}]

0 个答案:

没有答案