达拉斯的懒惰初始化非常慢,难以理解列表

时间:2018-12-04 22:21:42

标签: python loops parallel-processing dask dask-delayed

我试图查看Dask是否适合我的项目,并编写了一些非常简单的测试用例来研究其性能。但是,Dask花费相对较长的时间来简单地执行延迟初始化。

@delayed
def normd(st):
    return st.lower().replace(',', '')

@delayed
def add_vald(v):
    return v+5

def norm(st):
    return st.lower().replace(',', '')

def add_val(v):
    return v+5

test_list = [i for i in range(1000)]
test_list1 = ["AeBe,oF,221e"]*1000

%timeit rlist = [add_val(y) for y in test_list]
#124 µs ± 7.25 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit rlist = [norm(y) for y in test_list1]
#392 µs ± 18.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit rlist = [add_vald(y) for y in test_list]
#19.1 ms ± 436 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

rlist = [add_vald(y) for y in test_list]
%timeit rlist1 = compute(*rlist, get=dask.multiprocessing.get)
#892 ms ± 36.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit rlist = [normd(y) for y in test_list1]
#18.7 ms ± 408 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

rlist = [normd(y) for y in test_list1]
%timeit rlist1 = compute(*rlist, get=dask.multiprocessing.get)
#912 ms ± 54.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

我研究了Dask For Loop In Parallelparallel dask for loop slower than regular loop?,并尝试将大小增加到100万个项目,但是常规循环大约需要一秒钟,但令人讨厌的循环却永无止境。在等待了半小时以简单完成add_vald的延迟初始化之后,我将其杀死。

我不确定这里出了什么问题,将不胜感激您可能提供的任何见解。谢谢!

0 个答案:

没有答案