Dask随MS VS2017分发

时间:2019-06-03 04:35:05

标签: python dask dask-distributed

我一直使用Microsoft Visual Studio作为Python的IDE,最近开始使用Dask处理大型csv文件。尝试使用Dask Distributed时,如果尝试启动仪表板,则会收到许多错误。

我已经在多台计算机上的MS VS2017和Jupyter笔记本中尝试了简单的代码。我在Jupyter中没有收到错误,并且仪表板已正确加载。但是,代码崩溃了,Visual Studio下也没有仪表盘加载。

两个IDE都在同一环境下运行 我正在使用最新版本的Dask和Python 3.6

一些简单代码的示例:

from dask import dataframe as ddf
from dask import multiprocessing 
from dask.distributed import Client
client = Client()

在Jupyter下运行时,以上代码将在本地主机上启动dask仪表板。但是,VS2017会产生大量错误。以下是一些错误

distributed.nanny - WARNING - Worker process 13692 exited with status 1
The thread 0x8 has exited with code 0 (0x0).

The thread 0x4 has exited with code 0 (0x0).
The thread 0x9 has exited with code 0 (0x0).
The thread 0xb has exited with code 0 (0x0).
The thread 0xa has exited with code 0 (0x0).
distributed.nanny - WARNING - Worker process 15368 exited with status 1

The thread 0x5 has exited with code 0 (0x0).
distributed.nanny - WARNING - Worker process 16616 exited with status 1

The thread 0x6 has exited with code 0 (0x0).
distributed.nanny - WARNING - Worker process 22288 exited with status 1

The thread 0x7 has exited with code 0 (0x0).
distributed.nanny - WARNING - Restarting worker

Traceback (most recent call last):
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\multiprocessing\queues.py", line 236, in _feed
    send_bytes(obj)
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\multiprocessing\connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\multiprocessing\connection.py", line 280, in _send_bytes
    ov, err = _winapi.WriteFile(self._handle, buf, overlapped=True)
BrokenPipeError: [WinError 232] The pipe is being closed
The thread 0x10 has exited with code 0 (0x0).
tornado.application - ERROR - Multiple exceptions in yield list
Traceback (most recent call last):
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 883, in callback
    result_list.append(f.result())
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 1147, in run
    yielded = self.gen.send(value)
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\distributed\deploy\local.py", line 316, in _start_worker
    raise gen.TimeoutError("Worker failed to start")
tornado.util.TimeoutError: Worker failed to start

tornado.application - ERROR - Multiple exceptions in yield list
Traceback (most recent call last):
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 883, in callback
    result_list.append(f.result())
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 1147, in run
    yielded = self.gen.send(value)
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\distributed\deploy\local.py", line 316, in _start_worker
    raise gen.TimeoutError("Worker failed to start")
tornado.util.TimeoutError: Worker failed to start

tornado.application - ERROR - Multiple exceptions in yield list
Traceback (most recent call last):
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 883, in callback
    result_list.append(f.result())
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 1147, in run
    yielded = self.gen.send(value)
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\distributed\deploy\local.py", line 316, in _start_worker
    raise gen.TimeoutError("Worker failed to start")
tornado.util.TimeoutError: Worker failed to start

distributed.nanny - ERROR - Failed to restart worker after its process exited
Traceback (most recent call last):
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\distributed\nanny.py", line 343, in _on_exit
    yield self.instantiate()
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 1133, in run
    value = future.result()
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 1141, in run
    yielded = self.gen.throw(*exc_info)
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\distributed\nanny.py", line 276, in instantiate
    timedelta(seconds=self.death_timeout), self.process.start()
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 1133, in run
    value = future.result()
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 1141, in run
    yielded = self.gen.throw(*exc_info)
  Fil...

Worker failed to start
Stack trace:
 >  File "C:\Users\C\Anaconda3\envs\envTensorflow\Lib\site- 
 packages\distributed\deploy\local.py", line 316, in _start_worker
 >    raise gen.TimeoutError("Worker failed to start")

1 个答案:

答案 0 :(得分:0)

从错误看来,Visual Studio不喜欢以Dask使用多重处理的方式运行使用多重处理的交互式代码。

最简单的解决方案是在没有进程的情况下启动客户端

client = Client(processes=False)

尽管这会带来一些性能影响,尤其是在处理非数字数据时。