YARN

时间:2016-11-11 10:57:30

标签: hadoop mapreduce hdfs

即使是简单的WordCount mapreduce也会因同样的错误而失败。

Hadoop 2.6.0

以下是纱线日志。

在资源谈判期间似乎发生了某种超时 但是我无法验证相同的,确切地说是导致超时的原因。

  

2016-11-11 15:38:09,313 INFO   org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:   启动appattempt_1478856936677_0004_000002时出错。得到例外:   java.io.IOException:本地异常失败:java.io.IOException:   java.net.SocketTimeoutException:等待时60000毫秒超时   让频道准备好阅读。 ch:   java.nio.channels.SocketChannel [connected local = / 10.0.37.145:49054   远程平台=-演示/ 10.0.37.145:60487]。主机详细信息:本地主机是:   "平台的演示/ 10.0.37.145&#34 ;;目的地主机是:   "平台的演示":60487;           at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)           在org.apache.hadoop.ipc.Client.call(Client.java:1472)           在org.apache.hadoop.ipc.Client.call(Client.java:1399)           在org.apache.hadoop.ipc.ProtobufRpcEngine $ Invoker.invoke(ProtobufRpcEngine.java:232)           在com.sun.proxy。$ Proxy79.startContainers(未知来源)           at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)           在org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)           在org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)           在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)           at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:615)           at java.lang.Thread.run(Thread.java:745)引起:java.io.IOException:java.net.SocketTimeoutException:60000 millis   等待通道准备好读取时超时。 ch:   java.nio.channels.SocketChannel [connected local = / 10.0.37.145:49054   远程平台=-演示/ 10.0.37.145:60487]           在org.apache.hadoop.ipc.Client $ Connection $ 1.run(Client.java:680)           at java.security.AccessController.doPrivileged(Native Method)           在javax.security.auth.Subject.doAs(Subject.java:415)           在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)           at org.apache.hadoop.ipc.Client $ Connection.handleSaslConnectionFailure(Client.java:643)           在org.apache.hadoop.ipc.Client $ Connection.setupIOstreams(Client.java:730)           在org.apache.hadoop.ipc.Client $ Connection.access $ 2800(Client.java:368)           在org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)           在org.apache.hadoop.ipc.Client.call(Client.java:1438)           ... 9更多引起:java.net.SocketTimeoutException:等待通道准备好读取时60000毫秒超时。 ch:   java.nio.channels.SocketChannel [connected local = / 10.0.37.145:49054   远程平台=-演示/ 10.0.37.145:60487]           在org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)           在org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)           在org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)           在java.io.FilterInputStream.read(FilterInputStream.java:133)           在java.io.BufferedInputStream.fill(BufferedInputStream.java:235)           在java.io.BufferedInputStream.read(BufferedInputStream.java:254)           在java.io.DataInputStream.readInt(DataInputStream.java:387)           在org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:367)           at org.apache.hadoop.ipc.Client $ Connection.setupSaslConnection(Client.java:553)           在org.apache.hadoop.ipc.Client $ Connection.access $ 1800(Client.java:368)           在org.apache.hadoop.ipc.Client $ Connection $ 2.run(Client.java:722)           在org.apache.hadoop.ipc.Client $ Connection $ 2.run(Client.java:718)           at java.security.AccessController.doPrivileged(Native Method)           在javax.security.auth.Subject.doAs(Subject.java:415)           在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)           在org.apache.hadoop.ipc.Client $ Connection.setupIOstreams(Client.java:717)           ......还有12个

     

2016-11-11 15:38:09,319 INFO   org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:   使用更新应用程序尝试appattempt_1478856936677_0004_000002   最终状态:失败,退出状态:-1000 2016-11-11 15:38:09,319   信息   org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:   appattempt_1478856936677_0004_000002状态从ALLOCATED更改为   FINAL_SAVING

我尝试更改以下属性

  

yarn.nodemanager.resource.memory-mb
  2200物理内存量,以MB为单位,   可以分配给容器。

     

yarn.scheduler.minimum-allocation-mb
  500

     

dfs.datanode.socket.write.timeout
  3000000

     

dfs.socket.timeout 3000000   

1 个答案:

答案 0 :(得分:0)

Q1.MapReduce作业在YARN

接受后失败

原因是,130左右的多个连接卡在端口60487上。

Q2.MapReduce作业失败,经YARN接受后

问题是由于hadoop tmp / app / hadoop / tmp。清空此目录并重新尝试MAPR作业,作业已成功执行。

Q3.Unhealthy Node local-dirs很糟糕:/ tmp / hadoop-hduser / nm-local-dir

使用以下属性编辑yarn-site.xml。

<property>
        <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
        <value>98.5</value>
</property>

参考Why does Hadoop report "Unhealthy Node local-dirs and log-dirs are bad"?