Spark 2.1 + Yarn应用程序已经结束

时间:2018-02-08 09:54:04

标签: hadoop apache-spark yarn apache-spark-2.0

我们在ambari群集中使用spark应用程序2.1版

ambari thrift服务器不稳定并且一直重启

从日志中我们可以看到:

ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.

我们发现以下链接描述了此问题的解决方案

https://markobigdata.com/2016/08/11/yarn-application-has-already-ended-it-might-have-been-killed-or-unable-to-launch-application-master/

但在我们按照文章中的描述设置参数后,问题仍然存在

请建议解决方案是什么?

完整日志:

tail -f  spark-hive-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-master01.sys873dns.com.out
Spark Command: /usr/jdk64/jdk1.8.0_112/bin/java -Dhdp.version=2.6.0.3-8 -cp /usr/hdp/current/spark2-thriftserver/conf/:/usr/hdp/current/spark2-thriftserver/jars/*:/usr/hdp/current/hadoop-client/conf/ -Xmx10000m org.apache.spark.deploy.SparkSubmit --conf spark.driver.memory=50g --properties-file /usr/hdp/current/spark2-thriftserver/conf/spark-thrift-sparkconf.conf --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift JDBC/ODBC Server --executor-cores 7 spark-internal
========================================
Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
18/02/08 09:38:07 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2320)
        at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
        at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:47)
        at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:81)
        at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:745)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/02/08 09:38:07 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
18/02/08 09:38:07 ERROR Utils: Uncaught exception in thread main
java.lang.NullPointerException

我也给了纱线日志:

   grep -i erro yarn-yarn-resourcemanager-master01.sys873dns.com.log

018-02-08 11:19:00,993 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server master01.sys873dns.com/23.1.29.61:2181. Will not attempt to authenticate using SASL (unknown error)
2018-02-08 11:19:15,767 ERROR resourcemanager.ResourceManager (LogAdapter.java:error(69)) - RECEIVED SIGNAL 15: SIGTERM
2018-02-08 11:19:27,281 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server master01.sys873dns.com/23.1.29.61:2181. Will not attempt to authenticate using SASL (unknown error)
2018-02-08 11:29:00,064 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server master01.sys873dns.com/23.1.29.61:2181. Will not attempt to authenticate using SASL (unknown error)
2018-02-08 11:29:01,839 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server master01.sys873dns.com/23.1.29.61:2181. Will not attempt to authenticate using SASL (unknown error)
2018-02-08 11:29:15,725 ERROR resourcemanager.ResourceManager (LogAdapter.java:error(69)) - RECEIVED SIGNAL 15: SIGTERM
2018-02-08 11:29:27,033 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server master03.sys873dns.com/23.1.29.63:2181. Will not attempt to authenticate using SASL (unknown error)



ons.YarnException: Unauthorized request to start container.
2018-02-08 12:56:11,144 INFO  amlauncher.AMLauncher (AMLauncher.java:run(273)) - Error launching appattempt_1518089370033_0028_000008. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
2018-02-08 12:59:39,822 INFO  amlauncher.AMLauncher (AMLauncher.java:run(273)) - Error launching appattempt_1518089370033_0029_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
2018-02-08 13:00:01,671 INFO  amlauncher.AMLauncher (AMLauncher.java:run(273)) - Error launching appattempt_1518089370033_0029_000004. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
2018-02-08 13:00:18,062 INFO  amlauncher.AMLauncher (AMLauncher.java:run(273)) - Error launching appattempt_1518089370033_0029_000006. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
2018-02-08 13:00:20,245 INFO  amlauncher.AMLauncher (AMLauncher.java:run(273)) - Error launching appattempt_1518089370033_0030_000003. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
2018-02-08 13:00:42,100 INFO  amlauncher.AMLauncher (AMLauncher.java:run(273)) - Error launching appattempt_1518089370033_0030_000006. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
2018-02-08 13:00:56,310 INFO  amlauncher.AMLauncher (AMLauncher.java:run(273)) - Error launching appattempt_1518089370033_0030_000008. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
2018-02-08 13:00:58,511 INFO  amlauncher.AMLauncher (AMLauncher.java:run(273)) - Error launching appattempt_1518089370033_0030_000010. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
2018-02-08 13:00:58,537 INFO  rmapp.RMAppImpl (RMAppImpl.java:transition(1063)) - Application application_1518089370033_0030 failed 10 times due to Error launching appattempt_1518089370033_0030_000010. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.

上次登录

2018-02-08 14:14:54,410 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(778)) - application_1518089370033_0050 State change from FINAL_SAVING to FAILED
2018-02-08 14:14:54,410 INFO  capacity.ParentQueue (ParentQueue.java:removeApplication(385)) - Application removed - appId: application_1518089370033_0050 user: hive leaf-queue of parent: root #applications: 1
2018-02-08 14:14:54,412 INFO  integration.RMRegistryOperationsService (RMRegistryOperationsService.java:onApplicationCompleted(119)) - Application application_1518089370033_0050 completed, purging application-level records
2018-02-08 14:14:54,412 INFO  integration.RMRegistryOperationsService (RMRegistryOperationsService.java:purgeRecordsAsync(198)) -  records under / with ID application_1518089370033_0050 and policy application: {}
2018-02-08 14:14:55,393 INFO  rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(422)) - container_e09_1518089370033_0049_10_000001 Container Transitioned from RUNNING to COMPLETED
2018-02-08 14:14:55,393 INFO  scheduler.SchedulerNode (SchedulerNode.java:releaseContainer(220)) - Released container container_e09_1518089370033_0049_10_000001 of capacity <memory:10240, vCores:1> on host worker02.sys768.com:45454, which currently has 0 containers, <memory:0, vCores:0> used and <memory:30720, vCores:6> available, release resources=true
2018-02-08 14:14:55,393 INFO  attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:rememberTargetTransitionsAndStoreState(1209)) - Updating application attempt appattempt_1518089370033_0049_000010 with final state: FAILED, and exit status: -1000
2018-02-08 14:14:55,398 INFO  attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(809)) - appattempt_1518089370033_0049_000010 State change from LAUNCHED to FINAL_SAVING
2018-02-08 14:14:55,399 INFO  integration.RMRegistryOperationsService (RMRegistryOperationsService.java:onContainerFinished(144)) - Container container_e09_1518089370033_0049_10_000001 finished, purging container-level records
2018-02-08 14:14:55,400 INFO  integration.RMRegistryOperationsService (RMRegistryOperationsService.java:purgeRecordsAsync(198)) -  records under / with ID container_e09_1518089370033_0049_10_000001 and policy container: {}
2018-02-08 14:14:55,408 INFO  resourcemanager.ApplicationMasterService (ApplicationMasterService.java:unregisterAttempt(685)) - Unregistering app attempt : appattempt_1518089370033_0049_000010
2018-02-08 14:14:55,408 INFO  security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:applicationMasterFinished(124)) - Application finished, removing password for appattempt_1518089370033_0049_000010
2018-02-08 14:14:55,408 INFO  attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(809)) - appattempt_1518089370033_0049_000010 State change from FINAL_SAVING to FAILED
2018-02-08 14:14:55,408 INFO  rmapp.RMAppImpl (RMAppImpl.java:transition(1330)) - The number of failed attempts is 10. The max attempts is 10
2018-02-08 14:14:55,409 INFO  rmapp.RMAppImpl (RMAppImpl.java:rememberTargetTransitionsAndStoreState(1123)) - Updating application application_1518089370033_0049 with final state: FAILED
2018-02-08 14:14:55,409 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(778)) - application_1518089370033_0049 State change from ACCEPTED to FINAL_SAVING
2018-02-08 14:14:55,409 INFO  recovery.RMStateStore (RMStateStore.java:transition(228)) - Updating info for app: application_1518089370033_0049
2018-02-08 14:14:55,409 INFO  capacity.CapacityScheduler (CapacityScheduler.java:doneApplicationAttempt(811)) - Application Attempt appattempt_1518089370033_0049_000010 is done. finalState=FAILED
2018-02-08 14:14:55,409 INFO  scheduler.AppSchedulingInfo (AppSchedulingInfo.java:clearRequests(124)) - Application application_1518089370033_0049 requests cleared
2018-02-08 14:14:55,410 INFO  capacity.LeafQueue (LeafQueue.java:removeApplicationAttempt(795)) - Application removed - appId: application_1518089370033_0049 user: hive queue: default #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
2018-02-08 14:14:55,417 INFO  rmapp.RMAppImpl (RMAppImpl.java:transition(1063)) - Application application_1518089370033_0049 failed 10 times due to AM Container for appattempt_1518089370033_0049_000010 exited with  exitCode: -1000
For more detailed output, check the application tracking page: http://master02.sys768.com:8088/cluster/app/application_1518089370033_0049 Then click on links to logs of each attempt.
Diagnostics: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1212891131-25.1.53.61-1518077044052:blk_1073741833_1009 file=/hdp/apps/2.6.0.3-8/spark2/spark2-hdp-yarn-archive.tar.gz
Failing this attempt. Failing the application.
2018-02-08 14:14:55,418 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(778)) - application_1518089370033_0049 State change from FINAL_SAVING to FAILED
2018-02-08 14:14:55,418 INFO  capacity.ParentQueue (ParentQueue.java:removeApplication(385)) - Application removed - appId: application_1518089370033_0049 user: hive leaf-queue of parent: root #applications: 0
2018-02-08 14:14:55,419 INFO  integration.RMRegistryOperationsService (RMRegistryOperationsService.java:onApplicationCompleted(119)) - Application application_1518089370033_0049 completed, purging application-level records
2018-02-08 14:14:55,419 INFO  integration.RMRegistryOperationsService (RMRegistryOperationsService.java:purgeRecordsAsync(198)) -  records under / with ID application_1518089370033_0049 and policy application: {}
[root@master02 yarn]#

0 个答案:

没有答案