我一直使用Microsoft Visual Studio作为Python的IDE,最近开始使用Dask处理大型csv文件。尝试使用Dask Distributed时,如果尝试启动仪表板,则会收到许多错误。
我已经在多台计算机上的MS VS2017和Jupyter笔记本中尝试了简单的代码。我在Jupyter中没有收到错误,并且仪表板已正确加载。但是,代码崩溃了,Visual Studio下也没有仪表盘加载。
两个IDE都在同一环境下运行 我正在使用最新版本的Dask和Python 3.6
一些简单代码的示例:
from dask import dataframe as ddf
from dask import multiprocessing
from dask.distributed import Client
client = Client()
在Jupyter下运行时,以上代码将在本地主机上启动dask仪表板。但是,VS2017会产生大量错误。以下是一些错误
distributed.nanny - WARNING - Worker process 13692 exited with status 1
The thread 0x8 has exited with code 0 (0x0).
The thread 0x4 has exited with code 0 (0x0).
The thread 0x9 has exited with code 0 (0x0).
The thread 0xb has exited with code 0 (0x0).
The thread 0xa has exited with code 0 (0x0).
distributed.nanny - WARNING - Worker process 15368 exited with status 1
The thread 0x5 has exited with code 0 (0x0).
distributed.nanny - WARNING - Worker process 16616 exited with status 1
The thread 0x6 has exited with code 0 (0x0).
distributed.nanny - WARNING - Worker process 22288 exited with status 1
The thread 0x7 has exited with code 0 (0x0).
distributed.nanny - WARNING - Restarting worker
Traceback (most recent call last):
File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\multiprocessing\queues.py", line 236, in _feed
send_bytes(obj)
File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\multiprocessing\connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\multiprocessing\connection.py", line 280, in _send_bytes
ov, err = _winapi.WriteFile(self._handle, buf, overlapped=True)
BrokenPipeError: [WinError 232] The pipe is being closed
The thread 0x10 has exited with code 0 (0x0).
tornado.application - ERROR - Multiple exceptions in yield list
Traceback (most recent call last):
File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 883, in callback
result_list.append(f.result())
File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 1147, in run
yielded = self.gen.send(value)
File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\distributed\deploy\local.py", line 316, in _start_worker
raise gen.TimeoutError("Worker failed to start")
tornado.util.TimeoutError: Worker failed to start
tornado.application - ERROR - Multiple exceptions in yield list
Traceback (most recent call last):
File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 883, in callback
result_list.append(f.result())
File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 1147, in run
yielded = self.gen.send(value)
File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\distributed\deploy\local.py", line 316, in _start_worker
raise gen.TimeoutError("Worker failed to start")
tornado.util.TimeoutError: Worker failed to start
tornado.application - ERROR - Multiple exceptions in yield list
Traceback (most recent call last):
File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 883, in callback
result_list.append(f.result())
File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 1147, in run
yielded = self.gen.send(value)
File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\distributed\deploy\local.py", line 316, in _start_worker
raise gen.TimeoutError("Worker failed to start")
tornado.util.TimeoutError: Worker failed to start
distributed.nanny - ERROR - Failed to restart worker after its process exited
Traceback (most recent call last):
File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\distributed\nanny.py", line 343, in _on_exit
yield self.instantiate()
File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 1133, in run
value = future.result()
File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 1141, in run
yielded = self.gen.throw(*exc_info)
File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\distributed\nanny.py", line 276, in instantiate
timedelta(seconds=self.death_timeout), self.process.start()
File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 1133, in run
value = future.result()
File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 1141, in run
yielded = self.gen.throw(*exc_info)
Fil...
Worker failed to start
Stack trace:
> File "C:\Users\C\Anaconda3\envs\envTensorflow\Lib\site-
packages\distributed\deploy\local.py", line 316, in _start_worker
> raise gen.TimeoutError("Worker failed to start")
答案 0 :(得分:0)
从错误看来,Visual Studio不喜欢以Dask使用多重处理的方式运行使用多重处理的交互式代码。
最简单的解决方案是在没有进程的情况下启动客户端
client = Client(processes=False)
尽管这会带来一些性能影响,尤其是在处理非数字数据时。