HADOOP datanode奇怪的事情

时间:2012-08-03 22:55:53

标签: hadoop hdfs

我认为我必须对Hadoop集群中的数据节点有一些误解。我有一个由master,slave1,slave2,slave3组成的hadoop virtural集群。 Master和slave1在一台物理机器中,而slave2和slave3在一台物理机器中。当我启动集群时,在HDFS webUI中,我只能看到三个生活数据节点,slave1,master,slave2。但有时,三个活的datanode是master,slave1,slave3。那很奇怪。我ssh到未启动的数据节点,虽然我执行jps并且没有找到datanode,但我仍然可以在此节点上复制和删除HDFS上的文件。 所以我相信我一定不能正确理解datanode。我这里有三个问题。 1每个节点有一个数据节点吗? 2为什么不是datanode的节点仍然可以在HDFS上读写? 3我们可以决定datanode的数量吗?

这是unstarted datanode的日志:

  
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = slave11/192.168.111.31
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 1.0.3
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-    1.0 -r 1335192; compiled by 'hortonfo' on Tue May  8 20:31:25 UTC 2012
************************************************************/
2012-08-03 17:47:07,578 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded     properties from hadoop-metrics2.properties
2012-08-03 17:47:07,595 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean     for source MetricsSystem,sub=Stats registered.
2012-08-03 17:47:07,596 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2012-08-03 17:47:07,596 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2012-08-03 17:47:07,911 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2012-08-03 17:47:07,915 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2012-08-03 17:47:09,457 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.111.21:54310. Already tried 0 time(s).
2012-08-03 17:47:10,460 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.111.21:54310. Already tried 1 time(s).
2012-08-03 17:47:11,464 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.111.21:54310. Already tried 2 time(s).
2012-08-03 17:47:19,565 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Registered FSDatasetStatusMBean
2012-08-03 17:47:19,601 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 50010
2012-08-03 17:47:19,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s
2012-08-03 17:47:24,721 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2012-08-03 17:47:24,854 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2012-08-03 17:47:24,952 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dfs.webhdfs.enabled = false
2012-08-03 17:47:24,953 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50075
2012-08-03 17:47:24,953 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50075 webServer.getConnectors()[0].getLocalPort() returned 50075
2012-08-03 17:47:24,953 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50075
2012-08-03 17:47:24,953 INFO org.mortbay.log: jetty-6.1.26
2012-08-03 17:47:25,665 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50075
     

2012-08-03 17:47:25,688 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter:已注册源jvm的MBean。       2012-08-03 17:47:25,690 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter:已注册源DataNode的MBean。       2012-08-03 17:47:30,717 INFO org.apache.hadoop.ipc.Server:启动SocketReader       2012-08-03 17:47:30,718 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter:已注册源RpcDetailedActivityForPort50020的MBean。       2012-08-03 17:47:30,718 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter:已注册源RpcActivityForPort50020的MBean。       2012-08-03 17:47:30,721 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:dnRegistration = DatanodeRegistration(slave11:50010,storageID = DS-1062340636-127.0.0.1-50010-1339803955209,infoPort = 50075 ,ipcPort = 50020)       2012-08-03 17:47:30,764 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:启动异步块报告扫描       2012-08-03 17:47:30,766 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:DatanodeRegistration(192.168.111.31:50010,storageID = DS-1062340636-127.0.0.1-50010-1339803955209,infoPort = 50075 ,ipcPort = 50020)在DataNode.run中,data = FSDataset {dirpath =' / app / hadoop / tmp / dfs / data / current'}       2012-08-03 17:47:30,774 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:使用BLOCKREPORT_INTERVAL为3600000msec初始延迟:0msec   2012-08-03 17:47:30,778 INFO org.apache.hadoop.ipc.Server:50020上的IPC服务器处理程序2:启动   2012-08-03 17:47:30,772 INFO org.apache.hadoop.ipc.Server:IPC服务器响应程序:启动   2012-08-03 17:47:30,773 INFO org.apache.hadoop.ipc.Server:50020上的IPC服务器监听器:启动   2012-08-03 17:47:30,773 INFO org.apache.hadoop.ipc.Server:50020上的IPC服务器处理程序0:启动   2012-08-03 17:47:30,773 INFO org.apache.hadoop.ipc.Server:50020上的IPC服务器处理程序1:启动   2012-08-03 17:47:30,795 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:启动Periodic块扫描程序。   2012-08-03 17:47:30,816 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:52ms完成异步块报告扫描   2012-08-03 17:47:30,838 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:32 ms生成的粗略(无锁)块报告   2012-08-03 17:47:30,840 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:在2 ms内协调当前状态的异步块报告   2012-08-03 17:47:31,158 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:blk_-6072482390929551157_78209验证成功   2012-08-03 17:47:33,775 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:1 ms内针对当前状态的已协调异步块报告   2012-08-03 17:47:33,793 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:DataNode正在关闭:org.apache.hadoop.ipc.RemoteException:org.apache.hadoop.hdfs.protocol。 UnregisteredDatanodeException:数据节点192.168.111.31:50010正在尝试报告存储ID DS-1062340636-127.0.0.1-50010-1339803955209。节点192.168.111.32:50010预计将提供此存储。           在org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDatanode(FSNamesystem.java:4608)           在org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processReport(FSNamesystem.java:3460)           at org.apache.hadoop.hdfs.server.namenode.NameNode.blockReport(NameNode.java:1001)           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)           在java.lang.reflect.Method.invoke(Method.java:616)           在org.apache.hadoop.ipc.RPC $ Server.call(RPC.java:563)           在org.apache.hadoop.ipc.Server $ Handler $ 1.run(Server.java:1388)           在org.apache.hadoop.ipc.Server $ Handler $ 1.run(Server.java:1384)           at java.security.AccessController.doPrivileged(Native Method)           在javax.security.auth.Subject.doAs(Subject.java:416)           在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)           在org.apache.hadoop.ipc.Server $ Handler.run(Server.java:1382)

    at org.apache.hadoop.ipc.Client.call(Client.java:1070)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
    at $Proxy5.blockReport(Unknown Source)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:958)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1458)
    at java.lang.Thread.run(Thread.java:636)

2012-08-03 17:47:33,873 INFO org.mortbay.log:已停止SelectChannelConnector@0.0.0.0:50075 2012-08-03 17:47:33,980 INFO org.apache.hadoop.ipc.Server:在50020上停止服务器 2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server:50020上的IPC服务器处理程序0:退出 2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server:50020上的IPC服务器处理程序2:退出 2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server:50020上的IPC服务器处理程序1:退出

2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down
2012-08-03 17:47:33,982 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.111.31:50010, storageID=DS-1062340636-127.0.0.1-50010-1339803955209, infoPort=50075, ipcPort=50020):DataXceiveServer:java.nio.channels.AsynchronousCloseException
    at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:170)
    at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:102)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:131)
    at java.lang.Thread.run(Thread.java:636)

2012-08-03 17:47:33,982 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 50020
2012-08-03 17:47:33,982 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting DataXceiveServer
2012-08-03 17:47:33,983 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
  

2012-08-03 17:47:33,982 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:等待线程组退出,活动线程为1   2012-08-03 17:47:33,984 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:退出DataBlockScanner线程。   2012-08-03 17:47:33,985 INFO org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService:关闭所有异步磁盘服务线程...   2012-08-03 17:47:33,985 INFO org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService:所有异步磁盘服务线程都已关闭。   2012-08-03 17:47:33,985 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:DatanodeRegistration(192.168.111.31:50010,storageID = DS-1062340636-127.0.0.1-50010-1339803955209,infoPort = 50075 ,ipcPort = 50020):完成DataNode:FSDataset {dirpath =' / app / hadoop / tmp / dfs / data / current'}   2012-08-03 17:47:33,987 WARN org.apache.hadoop.metrics2.util.MBeans:Hadoop:service = DataNode,name = DataNodeInfo   javax.management.InstanceNotFoundException:Hadoop:service = DataNode,name = DataNodeInfo           at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1118)           at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:433)           at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:421)           at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:540)           在org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:71)           在org.apache.hadoop.hdfs.server.datanode.DataNode.unRegisterMXBean(DataNode.java:522)           在org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:737)           在org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1471)           在java.lang.Thread.run(Thread.java:636)   2012-08-03 17:47:33,988 INFO org.apache.hadoop.ipc.Server:在50020上停止服务器   2012-08-03 17:47:33,988 INFO org.apache.hadoop.ipc.metrics.RpcInstrumentation:关闭   2012-08-03 17:47:33,988 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:等待线程组退出,活动线程为0   2012-08-03 17:47:33,988 WARN org.apache.hadoop.metrics2.util.MBeans:Hadoop:service = DataNode,name = FSDatasetState-DS-1062340636-127.0.0.1-50010-1339803955209   javax.management.InstanceNotFoundException:Hadoop:service = DataNode,name = FSDatasetState-DS-1062340636-127.0.0.1-50010-1339803955209           at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1118)           at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:433)           at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:421)

at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:540)         在org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:71)         在org.apache.hadoop.hdfs.server.datanode.FSDataset.shutdown(FSDataset.java:2067)         在org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:799)         在org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1471)         在java.lang.Thread.run(Thread.java:636)

  

2012-08-03 17:47:33,988 WARN org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService:AsyncDiskService已经关闭。   2012-08-03 17:47:33,989 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:退出Datanode

1 个答案:

答案 0 :(得分:3)

每个主机名有多个DataNode存在问题。你说它是虚拟的,它们在不同的虚拟机上也是如此?如果是这样,这应该不是问题...

我会检查slave2和slave3的DataNode日志,看看为什么没有启动。将在那里打印错误消息。如果错误说明正在采取的端口或类似的东西。


您无需在DataNode上访问HDFS。 HDFS客户端(例如hadoop fs -put)直接与NameNode和其他DataNode进程通信,而无需访问本地进程。

在大型集群上实际上很常见的是有一个单独的“查询节点”可以访问HDFS和MapReduce,但是没有运行任何DataNode或TaskTracker服务。

只要您安装了Hadoop软件包并且配置文件正确指向NameNode和JobTracker,您就可以“远程”访问您的集群。