Dask Scheduler退出,并在“ ddf.persist()”上输出“ Killed”

时间:2019-11-17 21:37:40

标签: dask dask-distributed

我对DASK相当陌生,这可能真的很明显。 我正在尝试运行一个分布式dask设置,其中调度程序有1个节点,并且有足够的工作程序节点来适合内存中的数据-在这种情况下,我正在使用15个工作程序。我可以很好地启动集群,还可以加载一些数据并进行分析。

我已将数据复制到工作节点,但客户端计算机上没有可用的数据,因此,我像这样延迟了数据的加载:

import dask
import dask.dataframe as dd
from dask import delayed

def load_data(path):
    return dd.read_csv(path)

然后我可以做一些简单的分析:

taxi_df_2016 = delayed(load_data)('/tmp/2016/*.csv').compute()
taxi_df_2016['fare_amount'].mean().compute()

...将为我返回一个值

但是当我想将文件保留在内存中时,调度程序只是在控制台上打印Killed而死。...

taxi_df_2016_pers = taxi_df_2016.compute().persist()

将在调度程序节点的控制台上向我显示此内容:

distributed.scheduler - INFO - Register tcp://10.0.0.17:40385
distributed.scheduler - INFO - Register tcp://10.0.0.6:42847
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.0.17:40385
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.0.6:42847
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register tcp://10.0.0.9:44627
distributed.scheduler - INFO - Register tcp://10.0.0.7:44419
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.0.9:44627
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.0.7:44419
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register tcp://10.0.0.16:41907
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.0.16:41907
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register tcp://10.0.0.18:41879
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.0.18:41879
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register tcp://10.0.0.13:32993
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.0.13:32993
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register tcp://10.0.0.8:33265
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.0.8:33265
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register tcp://10.0.0.14:33851
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.0.14:33851
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register tcp://10.0.0.10:44653
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.0.10:44653
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register tcp://10.0.0.19:40201
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.0.19:40201
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register tcp://10.0.0.5:42207
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.0.5:42207
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register tcp://10.0.0.15:36087
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.0.15:36087
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register tcp://10.0.0.12:32827
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.0.12:32827
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register tcp://10.0.0.11:35405
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.0.11:35405
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Clear task state
Killed

当我查看仪表板时,我看到所有173个分区均已成功加载,并且内存已被使用,但是在此之后的某个时间,调度程序消失了。

enter image description here

关于如何调试此程序的任何想法?

1 个答案:

答案 0 :(得分:0)