如何检索Dask-YARN作业的工人日志?

时间:2019-07-11 20:49:57

标签: dask dask-distributed

我有一个简单的Dask-YARN脚本,仅执行一项任务:从HDFS加载文件,如下所示。但是,我在代码中遇到了一个错误,因此我在函数中添加了一个print语句,但是我看不到在使用yarn logs -applicationId {application_id}获得的工作日志中正在执行该语句。 。我什至尝试了方法Client.get_worker_logs(),但是它也没有显示stdout,而只显示了有关工人的INFO。代码执行完成后,如何获取工作日志?

import sys
import numpy as np
import scipy.signal
import json

import dask
from dask.distributed import Client
from dask_yarn import YarnCluster


@dask.delayed
def load(input_file):
    print("In call of Load...")
    with open(input_file, "r") as fo:
        data = json.load(fo)
    return data


# Process input args
(_, filename) = sys.argv


dag_1 = {
    'load-1': (load, filename)
}

print("Building tasks...")
tasks = dask.get(dag_1, 'load-1')

print("Creating YARN cluster now...")
cluster = YarnCluster()
print("Scaling YARN cluster now...")
cluster.scale(1)
print("Creating Client now...")
client = Client(cluster)

print("Getting logs..1")
print(client.get_worker_logs())

print("Doing Dask computations now...")
dask.compute(tasks)

print("Getting logs..2")
print(client.get_worker_logs())

print("Shutting down cluster now...")
cluster.shutdown()

1 个答案:

答案 0 :(得分:0)

我不确定这是怎么回事,打印语句应该(通常是)最终存储在yarn存储的日志文件中。

如果您希望调试语句出现在get_worker_logs的工作日志中,则可以直接使用工作日志记录器:

from distributed.worker import logger
logger.info("This will show up in the worker logs")