我收到了大量不同的低重复阻止报告。我想知道是什么导致了这一点。 hadoop dfsadmin -metasave
报告〜232,000个等待复制的MISSING块。我该如何解决?乔布斯运行得很好,似乎没有数据丢失。
请参阅hadoop fsck /
,hadoop dfsadmin -report
,hadoop dfsadmin -metasave
的输出以及下面的namenode网页GUI:
hadoop fsck /
:
Total size: 6066860793495 B (Total open files size: 47000701003 B)
Total dirs: 1801
Total files: 230828 (Files currently being written: 493)
Total blocks (validated): 242592 (avg. block size 25008494 B) (Total open file blocks (not validated): 681)
Minimally replicated blocks: 242592 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 932 (0.38418415 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.9945753
Corrupt blocks: 0
Missing replicas: 1851 (0.25479725 %)
Number of data-nodes: 20
Number of racks: 1
FSCK ended at Thu Nov 03 10:17:47 CDT 2011 in 7359 milliseconds
hadoop dfsadmin -report
:
Configured Capacity: 59070545264640 (53.72 TB)
Present Capacity: 56867905841329 (51.72 TB)
DFS Remaining: 37637696475136 (34.23 TB)
DFS Used: 19230209366193 (17.49 TB)
DFS Used%: 33.82%
Under replicated blocks: 245346
Blocks with corrupt replicas: 73
Missing blocks: 0
metasave输出... hadoop dfsadmin -metasave输出摘录:
232461 files and directories, 243290 blocks = 475751 total
Live Datanodes: 20
Dead Datanodes: 0
Metasave: Blocks waiting for replication: 242747
大约有1000个实际文件被复制(或等待),然后~232,000个文件“MISSING”全部类似于:
: blk_2551072940280567829_12480437 MISSING (replicas: l: 0 d: 0 c: 0 e: 0)
: blk_2565249812869117144_12480431 MISSING (replicas: l: 0 d: 0 c: 0 e: 0)
: blk_2950011510944289339_12480413 MISSING (replicas: l: 0 d: 0 c: 0 e: 0)
: blk_3809337797233614456_12456357 MISSING (replicas: l: 0 d: 0 c: 0 e: 0)
: blk_3809337797233614456_12463021 MISSING (replicas: l: 0 d: 0 c: 0 e: 0)
: blk_3809337797233614456_12468869 MISSING (replicas: l: 0 d: 0 c: 0 e: 0)
: blk_3809337797233614456_12474511 MISSING (replicas: l: 0 d: 0 c: 0 e: 0)
: blk_3811560762593023914_12440928 MISSING (replicas: l: 0 d: 0 c: 0 e: 0)
: blk_3811560762593023914_12449396 MISSING (replicas: l: 0 d: 0 c: 0 e: 0)
: blk_3811560762593023914_12462184 MISSING (replicas: l: 0 d: 0 c: 0 e: 0)
: blk_3811560762593023914_12465792 MISSING (replicas: l: 0 d: 0 c: 0 e: 0)
: blk_3811560762593023914_12472905 MISSING (replicas: l: 0 d: 0 c: 0 e: 0)
: blk_3812070171484751861_12436051 MISSING (replicas: l: 0 d: 0 c: 0 e: 0)
: blk_3815454413870879906_12441243 MISSING (replicas: l: 0 d: 0 c: 0 e: 0)
Metasave: Blocks being replicated: 0
Metasave: Blocks 29 waiting deletion from 17 datanodes.
Namenode web GUI:
Cluster Summary
232390 files and directories, 243235 blocks = 475625 total. Heap Size is 1.84 GB / 8.68 GB (21%)
Configured Capacity : 53.72 TB
DFS Used : 17.46 TB
Non DFS Used : 2 TB
DFS Remaining : 34.26 TB
DFS Used% : 32.51 %
DFS Remaining% : 63.77 %
Live Nodes : 20
Dead Nodes : 0
Decommissioning Nodes : 0
Number of Under-Replicated Blocks : 242532
!!更新:!!
我认为这肯定是一个错误,因为“重复不足”的数量现在接近一百万。我们在集群上没有接近该数量的实际块,因此这必定是一个错误。
Web GUI现在显示以下内容:
Cluster Summary
234877 files and directories, 250074 blocks = 484951 total. Heap Size is 706.5 MB/8.68 GB (7%)
Configured Capacity : 53.72 TB
DFS Used : 20.71 TB
Non DFS Used : 1.54 TB
DFS Remaining : 31.47 TB
DFS Used% : 38.56 %
DFS Remaining% : 58.58 %
Live Nodes : 20
Dead Nodes : 0
Decommissioning Nodes : 0
Number of Under-Replicated Blocks : 451014
答案 0 :(得分:7)
我收到了Cloudera的Todd Lipcon的回复。我想更新这个问题以防其他人有这个问题。我注意到CDH3u1的这个问题,这是回复:
” 已知“附加”功能在CDH3中被破坏并且可能会被破坏 有这样的错误。我们建议您不要建议您的用户 用它。 Hadoop 0.20.x的所有版本都是如此(CDH和 否则)将在CDH4(上游版本0.23或更高版本)中修复。
抱歉这个坏消息。我将研究这个特殊的bug 确定它不存在于上游干线中,但它不太可能存在 固定在CDH3版本中。 “