Spark Jobs崩溃与ExitCodeException exitCode = 15

时间:2016-03-20 14:13:31

标签: apache-spark spark-dataframe

我正在运行一个非常长的火花作业,崩溃时出现以下错误

Application application_1456200816465_347125 failed 2 times due to AM Container for appattempt_1456200816465_347125_000002 exited with exitCode: 15
For more detailed output, check application tracking page:http://foo.com:8088/proxy/application_1456200816465_347125/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e24_1456200816465_347125_02_000001
Exit code: 15
Stack trace: ExitCodeException exitCode=15:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 15
Failing this attempt. Failing the application.

我点击上面错误消息中提供的链接,然后显示

java.io.IOException: Target log file already exists (hdfs://nameservice1/user/spark/applicationHistory/application_1456200816465_347125)
    at org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:201)
    at org.apache.spark.SparkContext$$anonfun$stop$5.apply(SparkContext.scala:1394)
    at org.apache.spark.SparkContext$$anonfun$stop$5.apply(SparkContext.scala:1394)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.SparkContext.stop(SparkContext.scala:1394)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:107)
    at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

如果我重新启动作业,它可以正常工作1小时左右,然后再次因此错误而失败。请注意hdfs://nameservice1/user/spark/applicationHistory/application_1456200816465_347125是一些系统生成的东西。这个文件夹与我的应用程序无关。

我搜索了互联网,很多人都遇到了这个错误,因为他们在代码中将主人设置为本地。这是我初始化我的spark上下文的方式

val conf = new SparkConf().setAppName("Foo")
val context = new SparkContext(conf)
context.hadoopConfiguration.set("mapreduce.input.fileinputformat.input.dir.recursive","true")
val sc = new SQLContext(context)

我按照

运行我的火花工作
sudo -u web nohup spark-submit --class com.abhi.Foo--master yarn-cluster 
Foo-assembly-1.0.jar "2015-03-18" "2015-03-30" > fn_output.txt 2> fn_error.txt &

0 个答案:

没有答案