在YARN模式下在cloudera 5上运行spark app

时间:2014-08-21 18:21:24

标签: cloudera apache-spark yarn

我一直试图激发提交给我的cloudera集群几周。我真的希望有人知道这是如何运作的。

我创建了一个脚本,它使用所有必需的参数调用spark-submit。屏幕转储出以下行

Using properties file: null
Using properties file: null
Parsed arguments:
  master                  yarn
  deployMode              cluster
  executorMemory          null
  executorCores           null
  totalExecutorCores      null
  propertiesFile          null
  driverMemory            null
  driverCores             null
  driverExtraClassPath    /home/bruce/workspace1/spark-cloudera/yarn/stable/target/spark-yarn_2.10-1.0.0-cdh5.1.0.jar:/home/bruce/.m2/repository/org/apache/hadoop/hadoop-yarn-client/2.3.0-cdh5.1.0/hadoop-yarn-client-2.3.0-cdh5.1.0.jar:/home/bruce/.m2/repository/org/apache/hadoop/hadoop-common/2.3.0-cdh5.1.0/hadoop-common-2.3.0-cdh5.1.0.jar:/home/bruce/.m2/repository/org/apache/hadoop/hadoop-yarn-api/2.3.0-cdh5.1.0/hadoop-yarn-api-2.3.0-cdh5.1.0.jar:/home/bruce/.m2/repository/org/apache/hadoop/hadoop-yarn-common/2.3.0-cdh5.1.0/hadoop-yarn-common-2.3.0-cdh5.1.0.jar:/home/bruce/.m2/repository/org/apache/hadoop/hadoop-auth/2.3.0-cdh5.1.0/hadoop-auth-2.3.0-cdh5.1.0.jar:/home/bruce/.m2/repository/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            null
  files                   null
  pyFiles                 null
  archives                null
  mainClass               org.apache.spark.examples.SparkPi
  primaryResource         file:/home/bruce/workspace1/spark-cloudera/examples/target/scala-2.10/spark-examples-1.0.0-cdh5.1.0-hadoop2.3.0-cdh5.1.0.jar
  name                    org.apache.spark.examples.SparkPi
  childArgs               [10]
  jars                    null
  verbose                 true


log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

电话会在很长一段时间内停止,然后退出连接被拒绝。

我不明白的是参数指定使用YarnClient,但没有在哪里表明它知道如何联系纱线资源管理器,而不是ip,而不是端口。提交是在我的笔记本电脑上进行的,群集在邻近的子网上。 spark-submit如何确定如何联系纱线服务?

1 个答案:

答案 0 :(得分:2)

来自Spark Documentation

  

确保HADOOP_CONF_DIR或YARN_CONF_DIR指向该目录   其中包含Hadoop的(客户端)配置文件   簇。这些配置用于写入dfs并连接到   YARN ResourceManager。