Question

我正在尝试将代码重组为使用Dask而不是NumPy进行大型数组计算。但是，我在Dask的运行时性能方面苦苦挣扎：

In[15]: import numpy as np
In[16]: import dask.array as da
In[17]: np_arr = np.random.rand(10, 10000, 10000)
In[18]: da_arr = da.from_array(np_arr, chunks=(-1, 'auto', 'auto'))
In[19]: %timeit np.mean(np_arr, axis=0)
1 loop, best of 3: 2.59 s per loop
In[20]: %timeit da_arr.mean(axis=0).compute()
1 loop, best of 3: 4.23 s per loop

我看过类似的问题（why is dot product in dask slower than in numpy），但是尝试使用块大小并没有帮助。我将主要使用与上述大小大致相同的数组。是否建议对此类数组使用NumPy而不是Dask，或者我可以调整某些内容？我还尝试过使用Client中的dask.distributed，并以16个进程和每个进程4个线程（16个核心CPU）的方式启动它，但这使情况变得更糟。预先感谢！

编辑：我玩过Dask和分布式处理。数据传输（转储数组和结果检索）似乎是主要的限制/问题，而计算速度却非常快（436ms，而9.51s）。但是，即使对于client.compute()，挂墙时间也比do_stuff(data)大（12.1s）。总体而言，这可以改善数据传输吗？

In[3]: import numpy as np
In[4]: from dask.distributed import Client, wait
In[5]: from dask import delayed
In[6]: import dask.array as da
In[7]: client = Client('address:port')
In[8]: client
Out[8]: <Client: scheduler='tcp://address:port' processes=4 cores=16>
In[9]: data = np.random.rand(400, 100, 10000)
In[10]: %time [future] = client.scatter([data])
CPU times: user 8.36 s, sys: 5.08 s, total: 13.4 s
Wall time: 24.5 s
In[11]: x = da.from_delayed(delayed(future), shape=data.shape, dtype=data.dtype)
In[12]: x = x.rechunk(chunks=('auto', 'auto', 'auto'))
In[13]: x = client.persist(x)
In[14]: {w: len(keys) for w, keys in client.has_what().items()}
Out[14]: 
{'tcp://address:port': 65,
 'tcp://address:port': 0,
 'tcp://address:port': 0,
 'tcp://address:port': 0}
In[15]: client.rebalance(x)
In[16]: {w: len(keys) for w, keys in client.has_what().items()}
Out[16]: 
{'tcp://address:port': 17,
 'tcp://address:port': 16,
 'tcp://address:port': 16,
 'tcp://address:port': 16}
In[17]: def do_stuff(arr):
...         arr = arr/3. + arr**2 - arr**(1/2)
...         arr[arr >= 0.5] = 1
...         return arr
...   
In[18]: %time future_compute = client.compute(do_stuff(x)); wait(future_compute)
Matplotlib support failed
CPU times: user 387 ms, sys: 49.5 ms, total: 436 ms
Wall time: 12.1 s
In[19]: future_compute
Out[19]: <Future: status: finished, type: ndarray, key: finalize-54eb04bbe03eee8af686fd43b41eb161>
In[21]: %timeit future_compute.result()
1 loop, best of 3: 19.4 s per loop
In[21]: %time do_stuff(data)
CPU times: user 4.49 s, sys: 5.02 s, total: 9.51 s
Wall time: 9.5 s

Dask和NumPy的运行时比较

0 个答案: