什么是dask worker存储结果或文件的默认目录。

时间:2018-02-07 06:32:04

标签: dask dask-distributed dask-delayed

[mapr@impetus-i0057 latest_code_deepak]$ dask-worker 172.26.32.37:8786
distributed.nanny - INFO -         Start Nanny at: 'tcp://172.26.32.36:50930'
distributed.diskutils - WARNING - Found stale lock file and directory '/home/mapr/latest_code_deepak/dask-worker-space/worker-PwEseH', purging
distributed.worker - INFO -       Start worker at:   tcp://172.26.32.36:41694
distributed.worker - INFO -          Listening to:   tcp://172.26.32.36:41694
distributed.worker - INFO -              bokeh at:          172.26.32.36:8789
distributed.worker - INFO -              nanny at:         172.26.32.36:50930
distributed.worker - INFO - Waiting to connect to:    tcp://172.26.32.37:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                          8
distributed.worker - INFO -                Memory:                   33.52 GB
distributed.worker - INFO -       Local Directory: /home/mapr/latest_code_deepak/dask-worker-spa                                                                 ce/worker-AkBPtM
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -         Registered to:    tcp://172.26.32.37:8786
distributed.worker - INFO - -------------------------------------------------

dask-worker维护临时文件的默认目录是什么,例如任务结果,或者是从客户端使用upload_file()方法上传的下载文件。?

例如: -

def my_task_running_on_dask_worker():
    //fetch the file from hdfs
    // process the file
    //store the file back into hdfs

1 个答案:

答案 0 :(得分:3)

默认情况下,dask worker在./dask-worker-space/worker-#######中放置一个目录,其中######是该特定worker的随机字符串。

您可以使用--local-directory关键字将此位置更改为dask-worker可执行文件。

您在此行中看到的警告

distributed.diskutils - WARNING - Found stale lock file and directory '/home/mapr/latest_code_deepak/dask-worker-space/worker-PwEseH', purging

说Dask工作者注意到另一个工作者的目录没有被清除,大概是因为它以某种困难的方式失败了。这名工人正在清理前一名工人留下的空间。

修改

您可以通过查看每个工作人员的日志(他们打印出他们的本地目录)来查看哪个工作人员创建了哪个目录

$ dask-worker localhost:8786
distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:36607
...
distributed.worker - INFO -       Local Directory: /home/mrocklin/dask-worker-space/worker-ks3mljzt

或通过调用client.scheduler_info()

以编程方式
>>> client.scheduler_info()
{'address': 'tcp://127.0.0.1:34027',
 'id': 'Scheduler-bd88dfdf-e3f7-4b39-8814-beae779248f1',
 'services': {'bokeh': 8787},
 'type': 'Scheduler',
 'workers': {'tcp://127.0.0.1:33143': {'cpu': 7.7,
    ... 
   'local_directory': '/home/mrocklin/dask-worker-space/worker-8kvk_l81',
  },
...