我正在运行一个hadoop集群(版本0.20.205),我必须定期向集群部署新代码,这需要关闭集群并使用新代码重新启动它。我的问题是,由于太复杂而无法进入此处的原因,我无法确保jobtracker在tasktracker节点之前出现。我看到tasktracker节点尝试连接到尚未出现的jobtracker,并在将其打印到日志后关闭:
- Can not start task tracker because java.io.IOException: Call to <jobtracker node> failed on local exception: java.io.IOException: Connection reset by peer
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1103)
at org.apache.hadoop.ipc.Client.call(Client.java:1071)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at org.apache.hadoop.mapred.$Proxy5.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:370)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:429)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:331)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:296)
at org.apache.hadoop.mapred.TaskTracker$3.run(TaskTracker.java:794)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:790)
at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1428)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3674)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:342)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:800)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:745)
- SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down TaskTracker at <tasktracker node>
************************************************************/
我的问题是:我是否可以通过某种方式配置tasktracker节点以尝试在循环中重新连接,直到它们成功连接到jobtracker为止?
感谢您的帮助!