通过YARN客户端

时间:2017-06-08 19:36:32

标签: hadoop apache-spark yarn spark-submit

我正在使用org.apache.spark.deploy.yarn.Client(Spark 2.1.0)提交spark应用程序(SparkPi示例)。以下是相关的内容:

    List<String> arguments = Lists.newArrayList("--class", "org.apache.spark.examples.SparkPi","--jar", "path/to/spark examples jar", "--arg", "10");

    SparkConf sparkConf = new SparkConf();
    applicationTag = "TestApp-" + new Date().getTime();
    sparkConf.set("spark.yarn.submit.waitAppCompletion", "false");
    sparkConf.set("spark.yarn.tags", applicationTag);
    sparkConf.set("spark.submit.deployMode", "cluster");
    sparkConf.set("spark.yarn.jars", "/opt/spark/jars/*.jar");

    System.setProperty("SPARK_YARN_MODE", "true");
    System.setProperty("SPARK_HOME", "/opt/spark");

    ClientArguments cArgs = new ClientArguments(arguments.toArray(new String[arguments.size()]));
    Client client = new Client(cArgs, sparkConf);
    client.run();

这似乎有效,Spark应用程序出现在YARN RM UI&amp;成功。但是,容器日志显示正在获取暂存目录的URL SPARK_YARN_STAGING_DIR -> file:/home/{current user}/.sparkStaging/application_xxxxxx。通过org.apache.spark.deploy.yarn.Client显示了可能的原因,即没有正确选取登台目录的基本路径。基本路径应为hdfs://localhost:9000/user/{current user}/而不是file:/home/{current user}/,因为当登台目录被清除时,日志中出现以下错误:

java.lang.IllegalArgumentException: Wrong FS: file:/home/user/.sparkStaging/application_1496908076154_0022, expected: hdfs://127.0.0.1:9000
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:649)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:707)
at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:703)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:714)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$cleanupStagingDir(ApplicationMaster.scala:545)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:233)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

当使用spark-submit时,这一切都正常,因为我认为它正确设置了所有必需的环境变量。

我也尝试过设置sparkConf.set("spark.yarn.stagingDir", "hdfs://localhost:9000/user/{current user}");,但无效,因为它会导致其他一些错误,例如hdfs无法被识别为有效的文件系统。

0 个答案:

没有答案