Question

假设你有这种类型的函数，有些数据= df：

function_(df, parameter_a, parameter_b, parameter_c)

参数本身就是元素列表（字符串，日期时间对象）。 result_是一个小数据帧，最多1列x 25行。

result_ = function_(df, [a, b, c], [d, f], [x, y, c])
%timeit result_
10000000 loops, best of 3: 37.5 ns per loop

假设您至少有1000组参数来运行您的功能，现在它们的存储方式如下：

parameter_sets = [('set_1', function_(df, [a, b, c], [d, f], [x, y, c])),
                  ('set_2', function_(df, [b, c, d], [a, b], [x, x, x])),
                   ...
                   ...
                  ('set_1000', function_(df, [b, b, d], [b, b], [x, y, y]))]

如果我想运行所有参数集并合并每个输出的结果，我会这样做：

def run1000_function():
    parameter_sets = [('set_1', function_(df, [a, b, c], [d, f], [x, y, c])),
                      ('set_2', function_(df, [b, c, d], [a, b], [x, x, x])),
                       ...
                       ...
                      ('set_1000', function_(df, [b, b, d], [b, b], [x, y, y]))]

    output_d = dict(parameter_sets) #probably unnecessary, but used for convinience
    for key, df in output_d.items():
        df['Name'] = key  #to associate the set name with the output

    final_result = pd.concat(parameter_sets.values(), axis=0)

    return final result

所有这些都按预期完美运行，但问题是需要花费太多时间。当我运行run1000_function（）时，它需要超过20分钟。似乎运行列表'parameter_sets'是有问题的。问题是如何存储和读取这些参数集，以便加快整个过程的速度？

如何加速通过多组参数的函数？

0 个答案: