Dask - 新集群创建失败,“dask”用户欠下的 HDFS 文件

时间:2021-01-20 01:09:07

标签: python yarn dask skein dask-gateway

我按照此处的说明在我的 MapR 集群的边缘节点上设置了 dask:https://gateway.dask.org/install-hadoop.html

根据这些说明,我通过在 JupyterHub 生成的 ipython 笔记本中运行以下命令来测试安装:

from dask_gateway import Gateway
gateway = Gateway("http://sa1x-hadoopedg-np1.hchc.local:9010")
cluster = gateway.new_cluster()

但是,当它尝试通过 YARN 启动新集群时,我在 YARN 应用程序的日志中收到以下错误:

Diagnostics: User a059571(user id 1425180742) does not have access to maprfs:///user/a059571/.skein/application_1605411890003_0222/809B8EAF0CC3524F90366F449C11C97E/tmpv8cbv2ag

即使 dask 应该以请求用户的身份运行(在本例中为 a059571),它似乎也在以运行 dask-gateway-server 的用户(本例中为用户 mapr)的身份创建目录:

hdfs dfs -ls -d maprfs:///user/a059571/.skein/application_1605411890003_0222
drwx------   - mapr mapr  7 2021-01-19 17:37 maprfs:///user/a059571/.skein/application_1605411890003_0222

我觉得我遗漏了一些明显的东西。

这是我的配置,供完整披露:

/etc/dask-gateway/dask_gateway_config.py

c.DaskGateway.backend_class = (
    "dask_gateway_server.backends.yarn.YarnBackend"
)
c.DaskGateway.address= '12.190.113.133:9010'
c.Proxy.address = '12.190.113.133:9011'
c.Proxy.tcp_address = '12.190.113.133:9012'
c.YarnClusterConfig.scheduler_cmd = "/opt/anaconda3/bin/dask-scheduler"
c.YarnClusterConfig.worker_cmd = "/opt/anaconda3/bin/dask-worker"
c.YarnClusterConfig.queue = 'root.default'
c.DaskGateway.log_level= 'DEBUG'

来自我的 core_site.xml 的片段

  <property>
    <name>hadoop.proxyuser.mapr.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.mapr.groups</name>
    <value>*</value>
  </property>

而且,来自 dask-gateway-server 日志的一些有趣的行:

[DaskGateway] - HTTP routes listening at http://12.190.113.133:9011
[DaskGateway] - Scheduler routes listening at gateway://12.190.113.133:9012
[Proxy] Unexpected failure fetching routing table, retrying in 0.5s: Get http://12.190.113.133:9010/api/v1/routes: dial tcp 12.190.113.133:9010: connect: connection refused
[DaskGateway] Removed 0 expired clusters from the database
[Proxy] Unexpected failure fetching routing table, retrying in 1.0s: Get http://12.190.113.133:9010/api/v1/routes: dial tcp 12.190.113.133:9010: connect: connection refused
[Proxy] Unexpected failure fetching routing table, retrying in 2.0s: Get http://12.190.113.133:9010/api/v1/routes: dial tcp 12.190.113.133:9010: connect: connection refused
[Proxy] Unexpected failure fetching routing table, retrying in 4.0s: Get http://12.190.113.133:9010/api/v1/routes: dial tcp 12.190.113.133:9010: connect: connection refused
INFO skein.Driver: Driver started, listening on 44262
[DaskGateway] Backend started, clusters will contact api server at http://12.190.113.133:9011/api
[DaskGateway] Dask-Gateway server started
[DaskGateway] - Private API server listening at http://12.190.113.133:9010

注意:sa1x-hadoopedg-np1.hchc.local == 12.190.113.133,一个 RHEL 7.x 服务器。 MapR 集群是 6.x。

0 个答案:

没有答案