我正在尝试在8节点IB(OFED-1.5.3-4.0.42)集群上部署Hadoop-RDMA并遇到以下问题(也称为File ...只能复制到0个节点,而不是1个节点):
frolo@A11:~/hadoop-rdma-0.9.8> ./bin/hadoop dfs -copyFromLocal ../pg132.txt /user/frolo/input/pg132.txt Warning: $HADOOP_HOME is deprecated. 14/02/05 19:06:30 WARN hdfs.DFSClient: DataStreamer Exception: java.lang.reflect.UndeclaredThrowableException at com.sun.proxy.$Proxy1.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown Source) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown Source) at com.sun.proxy.$Proxy1.addBlock(Unknown Source) at org.apache.hadoop.hdfs.From.Code(Unknown Source) at org.apache.hadoop.hdfs.From.F(Unknown Source) at org.apache.hadoop.hdfs.From.F(Unknown Source) at org.apache.hadoop.hdfs.The.run(Unknown Source) Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/frolo/input/pg132.txt could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown Source) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source) at org.apache.hadoop.ipc.rdma.madness.Code(Unknown Source) at org.apache.hadoop.ipc.rdma.madness.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(Unknown Source) at org.apache.hadoop.ipc.rdma.be.run(Unknown Source) at org.apache.hadoop.ipc.rdma.RDMAClient.Code(Unknown Source) at org.apache.hadoop.ipc.rdma.RDMAClient.call(Unknown Source) at org.apache.hadoop.ipc.Tempest.invoke(Unknown Source) ... 12 more` 14/02/05 19:06:30 WARN hdfs.DFSClient: Error Recovery for null bad datanode[0] nodes == null 14/02/05 19:06:30 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/frolo/input/pg132.txt" - Aborting... 14/02/05 19:06:30 INFO hdfs.DFSClient: exception in isClosed
当我开始从本地文件系统复制到HDFS时,似乎数据没有传输到DataNode。我测试了DataNodes的可用性:
frolo@A11:~/hadoop-rdma-0.9.8> ./bin/hadoop dfsadmin -report Warning: $HADOOP_HOME is deprecated. Configured Capacity: 0 (0 KB) Present Capacity: 0 (0 KB) DFS Remaining: 0 (0 KB) DFS Used: 0 (0 KB) DFS Used%: �% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0` ------------------------------------------------- Datanodes available: 0 (4 total, 4 dead)` `Name: 10.10.1.13:50010 Decommission Status : Normal Configured Capacity: 0 (0 KB) DFS Used: 0 (0 KB) Non DFS Used: 0 (0 KB) DFS Remaining: 0(0 KB) DFS Used%: 100% DFS Remaining%: 0% Last contact: Wed Feb 05 19:02:54 MSK 2014 Name: 10.10.1.14:50010 Decommission Status : Normal Configured Capacity: 0 (0 KB) DFS Used: 0 (0 KB) Non DFS Used: 0 (0 KB) DFS Remaining: 0(0 KB) DFS Used%: 100% DFS Remaining%: 0% Last contact: Wed Feb 05 19:02:54 MSK 2014 Name: 10.10.1.16:50010 Decommission Status : Normal Configured Capacity: 0 (0 KB) DFS Used: 0 (0 KB) Non DFS Used: 0 (0 KB) DFS Remaining: 0(0 KB) DFS Used%: 100% DFS Remaining%: 0% Last contact: Wed Feb 05 19:02:54 MSK 2014 Name: 10.10.1.11:50010 Decommission Status : Normal Configured Capacity: 0 (0 KB) DFS Used: 0 (0 KB) Non DFS Used: 0 (0 KB) DFS Remaining: 0(0 KB) DFS Used%: 100% DFS Remaining%: 0% Last contact: Wed Feb 05 19:02:55 MSK 2014
并尝试在HDFS文件系统中成功使用mkdir。重新启动Hadoop守护进程没有产生任何积极影响。
你能帮我解决一下这个问题吗?谢谢。
最佳, 亚历
答案 0 :(得分:4)
我发现了我的问题。该问题与hadoop.tmp.dir的配置有关,该配置已设置为NFS分区。默认情况下,它配置为/ tmp,即本地fs。从core-site.xml中删除hadoop.tmp.dir之后,问题就解决了。
答案 1 :(得分:0)
就我而言,通过在por 50010上打开防火墙已解决了该问题