我遇到了在HDP 2.0上运行spark PI示例的问题
我从http://spark.apache.org/downloads.html(适用于HDP2)下载了spark 1.0预构建 来自spark web-site的运行示例:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 2g --executor-memory 2g --executor-cores 1 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 2
我收到了错误:
应用程序application_1404470405736_0044因AM而失败3次 appattempt_1404470405736_0044_000003的容器已退出 exitCode:1由于:容器启动异常: org.apache.hadoop.util.Shell $ ExitCodeException:at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)at org.apache.hadoop.util.Shell.run(Shell.java:379)at org.apache.hadoop.util.Shell $ ShellCommandExecutor.execute(Shell.java:589) 在 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) 在 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) 在 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) 在java.util.concurrent.FutureTask.run(FutureTask.java:262)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 在 java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:615) 在java.lang.Thread.run(Thread.java:744)。试图这个尝试.. 申请失败。
未知/不支持的参数列表( - executor-memory,2048, --executor-cores,1, - num-executors,3)用法:org.apache.spark.deploy.yarn.ApplicationMaster [options]选项:
--jar JAR_PATH应用程序的JAR文件的路径(必需) - class CLASS_NAME应用程序主类的名称(必填)... bla-bla-bla
任何想法?我怎样才能使它有效?
答案 0 :(得分:3)
我遇到了同样的问题。 原因是hdfs中的 spark-assembly.jar 版本 与您当前的火花版本不同。
例如hdfs版本中 org.apache.spark.deploy.yarn.Client 的params列表:
$ hadoop jar ./spark-assembly.jar org.apache.spark.deploy.yarn.Client --help
Usage: org.apache.spark.deploy.yarn.Client [options]
Options:
--jar JAR_PATH Path to your application's JAR file (required in yarn-cluster mode)
--class CLASS_NAME Name of your application's main class (required)
--args ARGS Arguments to be passed to your application's main class.
Mutliple invocations are possible, each will be passed in order.
--num-workers NUM Number of workers to start (Default: 2)
--worker-cores NUM Number of cores for the workers (Default: 1). This is unsused right now.
--master-class CLASS_NAME Class Name for Master (Default: spark.deploy.yarn.ApplicationMaster)
--master-memory MEM Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
--worker-memory MEM Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
--name NAME The name of your application (Default: Spark)
--queue QUEUE The hadoop queue to use for allocation requests (Default: 'default')
--addJars jars Comma separated list of local jars that want SparkContext.addJar to work with.
--files files Comma separated list of files to be distributed with the job.
--archives archives Comma separated list of archives to be distributed with the job.
最新安装的spark-assembly jar文件的相同帮助:
$ hadoop jar ./spark-assembly-1.0.0-cdh5.1.0-hadoop2.3.0-cdh5.1.0.jar org.apache.spark.deploy.yarn.Client
Usage: org.apache.spark.deploy.yarn.Client [options]
Options:
--jar JAR_PATH Path to your application's JAR file (required in yarn-cluster mode)
--class CLASS_NAME Name of your application's main class (required)
--arg ARGS Argument to be passed to your application's main class.
Multiple invocations are possible, each will be passed in order.
--num-executors NUM Number of executors to start (Default: 2)
--executor-cores NUM Number of cores for the executors (Default: 1).
--driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 512 Mb)
--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G)
--name NAME The name of your application (Default: Spark)
--queue QUEUE The hadoop queue to use for allocation requests (Default: 'default')
--addJars jars Comma separated list of local jars that want SparkContext.addJar to work with.
--files files Comma separated list of files to be distributed with the job.
--archives archives Comma separated list of archives to be distributed with the job.
所以,我将spark-assembly.jar更新为hdfs并且spark开始运行良好