如果未启动SparkContext,则群集模式中的Spark会引发错误

时间:2017-11-02 12:54:30

标签: scala apache-spark yarn

我有一个Spark作业,只有在真的有必要时才会初始化spark上下文:

val conf = new SparkConf()
val jobs: List[Job] = ??? //get some jobs
if(jobs.nonEmpty) {
  val sc = new SparkContext(conf)
  sc.parallelize(jobs).foreach(....)
} else {
    //do nothing
}

如果部署模式是'客户端'

,它在Yarn上运行良好
spark-submit --master yarn --deploy-mode client

然后我将部署模式切换到' cluster'并且在jobs.isEmpty

的情况下它开始崩溃
spark-submit --master yarn --deploy-mode cluster

以下是错误文字:

  

INFO yarn.Client:应用报告   application_1509613523426_0017(状态:已接受)   17/11/02 11:37:17

     

INFO yarn.Client:应用报告   application_1509613523426_0017(state:FAILED)17/11/02 11:37:17

     

INFO yarn.Client:客户端令牌:N / A诊断:应用程序   application_1509613523426_0017由于AM Container而失败2次   appattempt_1509613523426_0017_000002退出exitCode:-1000 For   更详细的输出,检查应用程序跟踪

     

页:http://xxxxxx.com:8088/cluster/app/application_1509613523426_0017Then,   单击每个尝试的日志链接。诊断:文件没有   存在:   HDFS://xxxxxxx/.sparkStaging/application_1509613523426_0017/__spark_libs__997458388067724499.zip   java.io.FileNotFoundException:文件不存在:   HDFS://xxxxxxx/.sparkStaging/application_1509613523426_0017/__spark_libs__997458388067724499.zip     在   org.apache.hadoop.hdfs.DistributedFileSystem $ 22.doCall(DistributedFileSystem.java:1309)     在   org.apache.hadoop.hdfs.DistributedFileSystem $ 22.doCall(DistributedFileSystem.java:1301)     在   org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)     在   org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)     在org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)     在   org.apache.hadoop.yarn.util.FSDownload.access $ 000(FSDownload.java:63)     在org.apache.hadoop.yarn.util.FSDownload $ 2.run(FSDownload.java:361)     在org.apache.hadoop.yarn.util.FSDownload $ 2.run(FSDownload.java:359)     在java.security.AccessController.doPrivileged(Native Method)at   javax.security.auth.Subject.doAs(Subject.java:422)at   org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)     在org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)     在org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)     在java.util.concurrent.FutureTask.run(FutureTask.java:266)at   java.util.concurrent.Executors $ RunnableAdapter.call(Executors.java:511)     在java.util.concurrent.FutureTask.run(FutureTask.java:266)at   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)     在   java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:617)     在java.lang.Thread.run(Thread.java:748)

     

未通过此尝试。申请失败。 ApplicationMaster   host:N / A ApplicationMaster RPC端口:-1 queue:dev start time:   1509622629354最终状态:FAILED跟踪网址:   http://xxxxxx.com:8088/cluster/app/application_1509613523426_0017 user:xxx线程中的异常" main"   org.apache.spark.SparkException:应用程序   application_1509613523426_0017以失败状态结束   org.apache.spark.deploy.yarn.Client.run(Client.scala:1104)at at   org.apache.spark.deploy.yarn.Client $ .main(Client.scala:1150)at at   org.apache.spark.deploy.yarn.Client.main(Client.scala)at   sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)     在   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     在java.lang.reflect.Method.invoke(Method.java:498)at   org.apache.spark.deploy.SparkSubmit $ .ORG $阿帕奇$火花$部署$ SparkSubmit $$ runMain(SparkSubmit.scala:755)     在   org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:180)     在org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:205)     在org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:119)     在org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)   17/11/02 11:37:17 INFO util.ShutdownHookManager:关闭挂钩调用   17/11/02 11:37:17 INFO util.ShutdownHookManager:删除目录   的/ tmp /火花a5b20def-0218-4b0c-b9f8-fdf8a1802e95

这是纱支持中的错误还是我错过了什么?

1 个答案:

答案 0 :(得分:2)

SparkContext是负责与集群管理器通信的人。如果将应用程序提交到集群,但从未创建上下文,则YARN无法确定应用程序的状态 - 这就是您收到错误的原因。