我有一种情况,如果某个条件不符合,那么就不需要在类中创建一个spark会话,并且应用程序会出现一个消息。
我在" yarn-cluster"模式
spark2-submit --class com.test.TestSpark --master yarn --deploy-mode client /home/test.jar false
作业的最终状态为"failed"
。
但如果在"纱线客户端"模式火花作业成功完成。
以下是代码:
package com.test;
import org.apache.spark.sql.SparkSession;
public class TestSpark {
public static void main(String[] args) {
boolean condition = false;
condition = Boolean.parseBoolean(args[0]);
if(condition){
SparkSession sparkSession = SparkSession.builder().appName("Data Ingestion Framework")
.config("hive.metastore.warehouse.dir", "/user/hive/warehouse").config("spark.sql.warehouse.dir", "/user/hive/warehouse")
.enableHiveSupport()
.getOrCreate();
}else{
System.out.println("coming out no processing required");
}
}
}
在" yarn-cluster
"的日志中我可以看到两个conatiner正在创建,其中一个失败,出现以下错误:
18/05/09 18:21:51 WARN security.UserGroupInformation: PriviledgedActionException as:*****<uername> (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: hdfs://hostname/user/*****<uername>/.sparkStaging/application_1525778267559_0054/__spark_conf__.zip
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://hostname/user/*****<uername>/.sparkStaging/application_1525778267559_0054/__spark_conf__.zip
at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1257)
at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1249)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1249)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$4$$anonfun$apply$3.apply(ApplicationMaster.scala:198)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$4$$anonfun$apply$3.apply(ApplicationMaster.scala:195)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$4.apply(ApplicationMaster.scala:195)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$4.apply(ApplicationMaster.scala:160)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:787)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
您能否解释为什么会发生这种情况以及Spark如何处理容器创建。
答案 0 :(得分:0)
阿米特,这是一个尚未公开的已知问题。 https://issues.apache.org/jira/browse/SPARK-10795
解决方法是初始化SparkContext。
package com.test;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.SparkSession;
public class TestSpark {
public static void main(String[] args) {
boolean condition = false;
condition = Boolean.parseBoolean(args[0]);
if(condition){
SparkSession sparkSession = SparkSession.builder().appName("Data Ingestion Framework")
.config("hive.metastore.warehouse.dir", "/user/hive/warehouse").config("spark.sql.warehouse.dir", "/user/hive/warehouse")
.enableHiveSupport()
.getOrCreate();
}else{
// Initialize a spark context to avoid failure : https://issues.apache.org/jira/browse/SPARK-10795
JavaSparkContext sparkContext = new JavaSparkContext(new SparkConf());
System.out.println("coming out no processing required");
}
}