我在Cloudera中有一个Hadoop集群,有4个节点,1个主服务器和3个从服务器,复制因子为3 并且在一天之内我的集群没有任何理由停止变大,我没有执行任何工作,设备上留下的空间在几分钟内变得最小,然后我删除一些文件并更改一些东西,有我的hadoop master和datanodes上的日志。
日志文件的一部分。
Hadoop主节点
2015-07-17 09:30:49,637 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.20.1.45 cmd=listCachePools src=null dst=null perm=null proto=rpc
2015-07-17 09:30:49,649 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.20.1.45 cmd=create src=/tmp/.cloudera_health_monitoring_canary_files/.canary_file_2015_07_17-09_30_49 dst=null perm=hdfs:supergroup:rw-rw-rw- proto=rpc
2015-07-17 09:30:49,684 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.20.1.45 cmd=open src=/tmp/.cloudera_health_monitoring_canary_files/.canary_file_2015_07_17-09_30_49 dst=null perm=null proto=rpc
2015-07-17 09:30:49,699 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.20.1.45 cmd=delete src=/tmp/.cloudera_health_monitoring_canary_files/.canary_file_2015_07_17-09_30_49 dst=null perm=null proto=rpc
Hadoop数据节点
2015-07-17 09:30:49,663 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-634864778-172.20.1.45-1399358938139:blk_1074658739_919097 src: /172.20.1.48:59941 dest: /172.20.1.46:50010
2015-07-17 09:30:49,669 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /172.20.1.48:59941, dest: /172.20.1.46:50010, bytes: 56, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-824197314_132, offset: 0, srvID: aa5e5f0e-4198-4df5-8dfa-6e7c57e6307d, blockid: BP-634864778-172.20.1.45-1399358938139:blk_1074658739_919097, duration: 4771606
2015-07-17 09:30:49,669 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-634864778-172.20.1.45-1399358938139:blk_1074658739_919097, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2015-07-17 09:30:51,406 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1074658739_919097 file /dfs/dn/current/BP-634864778-172.20.1.45-1399358938139/current/finalized/subdir13/subdir253/blk_1074658739 for deletion
2015-07-17 09:30:51,407 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-634864778-172.20.1.45-1399358938139 blk_1074658739_919097 file /dfs/dn/current/BP-634864778-172.20.1.45-1399358938139/current/finalized/subdir13/subdir253/blk_1074658739
pl.FsDatasetAsyncDiskService: Deleted BP-634864778-172.20.1.45-1399358938139 blk_1074658740_919098 file /dfs/dn/current/BP-634864778-172.20.1.45-1399358938139/current/finalized/subdir13/subdir253/blk_1074658740
2015-07-17 09:32:54,684 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-634864778-172.20.1.45-1399358938139:blk_1074658741_919099 src: /172.20.1.48:33789 dest: /172.20.1.47:50010
2015-07-17 09:32:54,725 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /172.20.1.48:33789, dest: /172.20.1.47:50010, bytes: 56, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_705538126_132, offset: 0, srvID: bff71ff1-db18-438a-b2ba-4731fa36d44e, blockid: BP-634864778-172.20.1.45-1399358938139:blk_1074658741_919099, duration: 39309294
2015-07-17 09:32:54,725 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-634864778-172.20.1.45-1399358938139:blk_1074658741_919099, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2015-07-17 09:32:55,909 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
2015-07-17 09:32:55,911 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
此时我的所有群集服务都已停止。
你知道会发生什么事吗? 任何帮助,将不胜感激 非常感谢
答案 0 :(得分:0)
我在PROD集群中添加了一些数据节点,运行cloudera manager 5.4和CDH5.4。 每个节点配置如下:
12个磁盘,每个磁盘安装在差异文件系统上,/var
和/tmp
以及差异化磁盘上的操作系统。
一旦我添加了数据节点,每个卷立即填充46.9 gb数据(几乎占每个磁盘容量的5%)。这是在运行重新平衡之前。
Each of disk is filled as below:
[root@data14-prod ~]# du -sh /dfs1/*
8.6G /dfs1/dfs
16K /dfs1/lost+found
331M /dfs1/yarn
This usage doesn't account for missing 46gb space.
Swap space is set to 19gb from OS disk.
Output of df -h.
[root@data14-prod ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_data14prod-lv_root
147G 11G 129G 8% /
tmpfs 63G 32K 63G 1% /dev/shm
/dev/sda1 477M 78M 374M 18% /boot
/dev/sdb1 917G 9.0G 861G 2% /dfs1
/dev/sdc1 917G 11G 860G 2% /dfs2
/dev/sdd1 917G 8.2G 862G 1% /dfs3
/dev/sde1 917G 9.6G 861G 2% /dfs4
/dev/sdf1 917G 8.8G 861G 2% /dfs5
/dev/sdg1 917G 8.8G 861G 2% /dfs6
/dev/sdh1 917G 11G 860G 2% /dfs7
/dev/sdi1 917G 9.0G 861G 2% /dfs8
/dev/sdj1 917G 8.2G 862G 1% /dfs9
/dev/sdk1 917G 9.2G 861G 2% /dfs10
/dev/sdl1 917G 8.4G 862G 1% /dfs11
/dev/sdm1 917G 7.5G 863G 1% /dfs12
/dev/mapper/vg_data14prod-lv_tmp
59G 54M 56G 1% /tmp
/dev/mapper/vg_data14prod-lv_var
50G 765M 46G 2% /var
cm_processes 63G 756K 63G 1% /var/run/cloudera-scm-agent/process
Cloudera config:
Disk Mount Point Usage
/dev/sdl1 /dfs11 55.7 GiB/916.3 GiB
/dev/sdk1 /dfs10 53.9 GiB/916.3 GiB
/dev/sdm1 /dfs12 54.3 GiB/916.3 GiB
/dev/mapper/vg_data08prod-lv_var /var 3.2 GiB/49.1 GiB
/dev/mapper/vg_data08prod-lv_tmp /tmp 3.1 GiB/58.9 GiB
/dev/sda1 /boot 102.9 MiB/476.2 MiB
/dev/sdg1 /dfs6 54.7 GiB/916.3 GiB
cm_processes /var/run/cloudera-scm-agent/process 756.0 KiB/63.0 GiB
/dev/mapper/vg_data08prod-lv_root / 18.1 GiB/146.2 GiB
/dev/sdj1 /dfs9 54.6 GiB/916.3 GiB
/dev/sdi1 /dfs8 53.8 GiB/916.3 GiB
/dev/sdb1 /dfs1 56.3 GiB/916.3 GiB
/dev/sdd1 /dfs3 55.2 GiB/916.3 GiB
/dev/sdc1 /dfs2 55.6 GiB/916.3 GiB
/dev/sdf1 /dfs5 55.4 GiB/916.3 GiB
/dev/sde1 /dfs4 55.0 GiB/916.3 GiB
/dev/sdh1 /dfs7 55.0 GiB/916.3 Gi[output of df -h and du -h /dfs1/[File system as seen on cloudera][1]B
tmpfs /dev/shm 16.0 KiB/63.0 GiB
Any ideas? where is my missing 46gb on each disk.
This is a huge issue because, combining all 12 disks and 16 datanodes which i added resulted in loss of 9TB disk space unaccounted for.
[Cloudera config]: http://i.stack.imgur.com/XQcdg.jpg