Hadoop集群具有8个具有高可用性资源管理器的节点。 活动ResourceManager在节点3中,备用ResourceManager在节点2中。
当我以集群模式提交应用程序时。驱动程序容器可以放在 8个节点中的任何一个。如果驱动程序Container转到节点3(其中Active ResourceManager 服务正在运行),那么我可以打开应用程序主用户界面,但在其他情况下 在没有打开的情况下,一段时间后ambari将显示严重警告并显示消息 连接到资源管理器主机URL失败。
如果我检查资源管理器日志。它显示了火花用户的一些访问异常 调用getServiceState时。
这是完整的堆栈strace:
2018-08-25 05:02:30,209 WARN resourcemanager.AdminService (RMServerUtils.java:verifyAdminAccess(185)) - User spark doesn't have permission to call 'getServiceState'
2018-08-25 05:02:30,210 WARN resourcemanager.RMAuditLogger (RMAuditLogger.java:logFailure(345)) - USER=spark IP=11.111.1.11 OPERATION=getServiceState TARGET=AdminService RESULT=FAILURE DESCRIPTION=Unauthorized user PERMISSIONS=
2018-08-25 05:02:30,210 INFO ipc.Server (Server.java:logException(2294)) - IPC Server handler 0 on 8033, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from 11.111.1.11:40169 Call#51845 Retry#0
org.apache.hadoop.security.AccessControlException: User spark doesn't have permission to call 'getServiceState'
at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.verifyAdminAccess(RMServerUtils.java:191)
at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.verifyAdminAccess(RMServerUtils.java:157)
at org.apache.hadoop.yarn.server.resourcemanager.AdminService.checkAccess(AdminService.java:232)
at org.apache.hadoop.yarn.server.resourcemanager.AdminService.getServiceStatus(AdminService.java:365)
at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.getServiceStatus(HAServiceProtocolServerSideTranslatorPB.java:131)
at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4464)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200)
2018-08-25 05:05:43,300 INFO client.DefaultHttpClient (DefaultRequestDirector.java:tryExecute(726)) - I/O exception (org.apache.http.NoHttpResponseException) caught when processing request: The target server failed to respond