无法在HDP 2.0上运行Spark 1.0 SparkPi

时间:2014-07-04 13:08:08

标签: hadoop apache-spark hortonworks-data-platform

我遇到了在HDP 2.0上运行spark PI示例的问题

我从http://spark.apache.org/downloads.html(适用于HDP2)下载了spark 1.0预构建 来自spark web-site的运行示例:

 ./bin/spark-submit --class org.apache.spark.examples.SparkPi     --master yarn-cluster --num-executors 3 --driver-memory 2g --executor-memory 2g --executor-cores 1 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 2

我收到了错误:

  

应用程序application_1404470405736_0044因AM而失败3次   appattempt_1404470405736_0044_000003的容器已退出   exitCode:1由于:容器启动异常:   org.apache.hadoop.util.Shell $ ExitCodeException:at   org.apache.hadoop.util.Shell.runCommand(Shell.java:464)at   org.apache.hadoop.util.Shell.run(Shell.java:379)at   org.apache.hadoop.util.Shell $ ShellCommandExecutor.execute(Shell.java:589)   在   org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)   在   org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)   在   org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)   在java.util.concurrent.FutureTask.run(FutureTask.java:262)at   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)   在   java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:615)   在java.lang.Thread.run(Thread.java:744)。试图这个尝试..   申请失败。

     

未知/不支持的参数列表( - executor-memory,2048,   --executor-cores,1, - num-executors,3)用法:org.apache.spark.deploy.yarn.ApplicationMaster [options]选项:
  --jar JAR_PATH应用程序的JAR文件的路径(必需) - class CLASS_NAME应用程序主类的名称(必填)... bla-bla-bla

任何想法?我怎样才能使它有效?

1 个答案:

答案 0 :(得分:3)

我遇到了同样的问题。 原因是hdfs中的 spark-assembly.jar 版本 与您当前的火花版本不同。

例如hdfs版本中 org.apache.spark.deploy.yarn.Client 的params列表:

  $ hadoop jar ./spark-assembly.jar  org.apache.spark.deploy.yarn.Client --help
Usage: org.apache.spark.deploy.yarn.Client [options] 
Options:
  --jar JAR_PATH             Path to your application's JAR file (required in yarn-cluster mode)
  --class CLASS_NAME         Name of your application's main class (required)
  --args ARGS                Arguments to be passed to your application's main class.
                             Mutliple invocations are possible, each will be passed in order.
  --num-workers NUM          Number of workers to start (Default: 2)
  --worker-cores NUM         Number of cores for the workers (Default: 1). This is unsused right now.
  --master-class CLASS_NAME  Class Name for Master (Default: spark.deploy.yarn.ApplicationMaster)
  --master-memory MEM        Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
  --worker-memory MEM        Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
  --name NAME                The name of your application (Default: Spark)
  --queue QUEUE              The hadoop queue to use for allocation requests (Default: 'default')
  --addJars jars             Comma separated list of local jars that want SparkContext.addJar to work with.
  --files files              Comma separated list of files to be distributed with the job.
  --archives archives        Comma separated list of archives to be distributed with the job.

最新安装的spark-assembly jar文件的相同帮助:

$ hadoop jar ./spark-assembly-1.0.0-cdh5.1.0-hadoop2.3.0-cdh5.1.0.jar org.apache.spark.deploy.yarn.Client
Usage: org.apache.spark.deploy.yarn.Client [options] 
Options:
  --jar JAR_PATH             Path to your application's JAR file (required in yarn-cluster mode)
  --class CLASS_NAME         Name of your application's main class (required)
  --arg ARGS                 Argument to be passed to your application's main class.
                             Multiple invocations are possible, each will be passed in order.
  --num-executors NUM        Number of executors to start (Default: 2)
  --executor-cores NUM       Number of cores for the executors (Default: 1).
  --driver-memory MEM        Memory for driver (e.g. 1000M, 2G) (Default: 512 Mb)
  --executor-memory MEM      Memory per executor (e.g. 1000M, 2G) (Default: 1G)
  --name NAME                The name of your application (Default: Spark)
  --queue QUEUE              The hadoop queue to use for allocation requests (Default: 'default')
  --addJars jars             Comma separated list of local jars that want SparkContext.addJar to work with.
  --files files              Comma separated list of files to be distributed with the job.
  --archives archives        Comma separated list of archives to be distributed with the job.

所以,我将spark-assembly.jar更新为hdfs并且spark开始运行良好