MapReduce挂起作业,“容器”问题

时间:2013-12-18 19:42:49

标签: hadoop mapreduce yarn

当我运行MapReduce作业时,它会挂起并最终失败(约20分钟后)。

这是我看到的错误代码:8088

exited with exitCode: -100 due to: Container expired since it was unused.Failing this attempt.. Failing the application. 

关于这个问题的任何想法?

我正在运行Hadoop 2.2。

更新

问题似乎与此有关:

Container killed by the framework, either due to being released by the application or being 'lost' due to node failures etc. have a special exit code of -100.

更新2:

这些错误来自资源管理器日志:

2013-12-18 04:28:42,544 INFO 

org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0> cluster=<memory:16384, vCores:16>
2013-12-18 04:28:42,544 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting completed queue: root.default stats: default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=0
2013-12-18 04:28:42,544 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1387307711170_0002_000002 released container container_1387307711170_0002_02_000001 on node: host: slave-2:42143 #containers=0 available=8192 used=0 with event: EXPIRE
2013-12-18 04:28:42,544 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Unregistering app attempt : appattempt_1387307711170_0002_000002
2013-12-18 04:28:42,545 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1387307711170_0002_000002 State change from ALLOCATED to FAILED
2013-12-18 04:28:42,545 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1387307711170_0002 failed 2 times due to AM Container for appattempt_1387307711170_0002_000002 exited with  exitCode: -100 due to: Container expired since it was unused.Failing this attempt.. Failing the application.
    2013-12-18 04:28:42,546 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing info for app: application_1387307711170_0002
    2013-12-18 04:28:42,546 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1387307711170_0002 State change from ACCEPTED to FAILED
    2013-12-18 04:28:42,546 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hduser   OPERATION=Application Finished - Failed TARGET=RMAppManager     RESULT=FAILURE  DESCRIPTION=App failed with state: FAILED       PERMISSIONS=Application application_1387307711170_0002 failed 2 times due to AM Container for appattempt_1387307711170_0002_000002 exited with  exitCode: -100 due to: Container expired since it was unused.Failing this attempt.. Failing the application.    APPID=application_1387307711170_0002
2013-12-18 04:28:42,546 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=application_1387307711170_0002,name=streamjob5941238512810428268.jar,user=hduser,queue=default,state=FAILED,trackingUrl=master-1:8088/cluster/app/application_1387307711170_0002,appMasterHost=N/A,startTime=1387339379570,finishTime=1387340922546,finalStatus=FAILED
2013-12-18 04:28:42,546 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1387307711170_0002_000002 is done. finalState=FAILED
2013-12-18 04:28:42,546 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: Application application_1387307711170_0002 requests cleared
2013-12-18 04:28:42,546 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Application removed - appId: application_1387307711170_0002 user: hduser queue: default #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
2013-12-18 04:28:42,547 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1387307711170_0002 user: hduser leaf-queue of parent: root #applications: 0
2013-12-18 04:28:43,136 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: slave-2/10.239.132.243:42143. Already tried 39 time(s); maxRetries=45
2013-12-18 04:29:03,157 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: slave-2/10.239.132.243:42143. Already tried 40 time(s); maxRetries=45
2013-12-18 04:29:23,158 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: slave-2/10.239.132.243:42143. Already tried 41 time(s); maxRetries=45
2013-12-18 04:29:43,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: slave-2/10.239.132.243:42143. Already tried 42 time(s); maxRetries=45
2013-12-18 04:30:03,183 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: slave-2/10.239.132.243:42143. Already tried 43 time(s); maxRetries=45
2013-12-18 04:30:23,185 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: slave-2/10.239.132.243:42143. Already tried 44 time(s); maxRetries=45
2013-12-18 04:30:43,208 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1387307711170_0002_000002. Got exception: org.apache.hadoop.net.ConnectTimeoutException: Call From ip-10-73-169-19/10.73.169.19 to slave-2:42143 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=slave-2/10.239.132.243:42143]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:749)
        at org.apache.hadoop.ipc.Client.call(Client.java:1351)
        at org.apache.hadoop.ipc.Client.call(Client.java:1300)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy69.startContainers(Unknown Source)
        at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
        at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:118)
        at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:249)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=slave-2/10.239.132.243:42143]
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
        at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
2013-12-18 04:30:43,208 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:625)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:104)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:566)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:547)
        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
        at java.lang.Thread.run(Thread.java:724)
2013-12-18 19:15:17,626 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Rolling master-key for amrm-tokens
2013-12-18 19:15:17,632 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager: Rolling master-key for container-tokens
2013-12-18 19:15:17,633 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager: Going to activate master-key with key-id 422264835 in 900000ms
2013-12-18 19:15:17,637 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Rolling master-key for nm-tokens
2013-12-18 19:15:17,637 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Going to activate master-key with key-id 1883530799 in 900000ms
2013-12-18 19:15:25,884 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2013-12-18 19:15:25,885 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager: storing master key with keyID 3
2013-12-18 19:30:17,633 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager: Activating next master key with id: 422264835
2013-12-18 19:30:17,637 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Activating next master key with id: 1883530799

0 个答案:

没有答案