Spark动作工作流作业失败 - 无法连接到ResourceManager

时间:2015-09-07 09:36:11

标签: apache-spark oozie hadoop2 hortonworks-data-platform

Spark工作流作业在尝试使用默认地址:0.0.0.0:8032连接到资源管理器时处于RUNNING状态。

在job.properties中,namenode和jobtracker address:port已正确设置为yarn -res.xml中的yarn.resourcemanager.address和port。

如下面的日志中所示,MR任务尝试仅在正确的地址连接到ResourceManager一次,然后尝试使用默认地址进行连接:

2015-09-07 04:33:30,865 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
2015-09-07 04:33:31,213 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at ip-172-31-15-94.us-west-2.compute.internal/172.31.15.94:8032
2015-09-07 04:33:32,109 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2015-09-07 04:33:33,120 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032.** Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-09-07 04:33:34,121 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

在运行Oozie作业之前,我还将YARN_CONF_DIR和HADOOP_CONF_DIR导出到/ etc / hadoop / conf(其中存在yarn-site.xml)。 我正在尝试Oozie存档(http://archive.apache.org/dist/oozie/4.2.0/)附带的示例Spark工作流作业。 MapReduce,Shell和Java工作流示例都可以正常工作。

job.properties:

nameNode=hdfs://ip-172-31-15-93.us-west-2.compute.internal:8020
jobTracker=ip-172-31-15-94.us-west-2.compute.internal:8032
master=yarn-client
queueName=default
examplesRoot=examples
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/spark

workflow.xml:

<workflow-app xmlns=’uri:oozie:workflow:0.5′ name=’SparkFileCopy’>
<start to=’spark-node’ />

<action name=’spark-node’>
<spark xmlns=”uri:oozie:spark-action:0.1″>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path=”${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/spark”/>
</prepare>
<master>${master}</master>
<name>Spark-FileCopy</name>
<class>org.apache.oozie.example.SparkFileCopy</class>
<jar>${nameNode}/user/${wf:user()}/${examplesRoot}/apps/spark/lib/oozie-examples-4.2.0.jar</jar>
<arg>${nameNode}/user/${wf:user()}/${examplesRoot}/data/data.txt</arg>
<arg>${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/spark</arg>
</spark>
<ok to=”end” />
<error to=”fail” />
</action>

<kill name=”fail”>
<message>Workflow failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]
</message>
</kill>
<end name=’end’ />
</workflow-app>

我正在以oozie用户身份运行oozie工作。

环境:Hortonworks数据平台2.3。 Oozie v4.2,Spark 1.3.1在EC2上的小型集群(2个数据节点,1个名称节点)上设置。

我是否缺少任何其他配置/关键步骤?这方面的任何线索都将是巨大的帮助。

0 个答案:

没有答案