担心:单个节点上的HDFS损坏 - 如何解决

时间:2016-05-16 10:10:35

标签: hadoop hdfs

我在一台机器上运行CDH 5.6(Hadoop 2.6,HBase 1.0.0)。只有Hadoop和HBase在运行。 Hadoop配置为复制因子1,Hbase运行在HDFS之上,没有伪分布式模式。

在过去的三天里,我运行了一个简单的程序,它将使用10个并行的线程将行插入HBase。

现在检查它我发现HDFS已经损坏,除了一个插入线程之外的所有线程都失败了。

正在运行

hdfs fsck / | grep CORRUPT

我看到有一些损坏的块。

hbase hbck

说一切都很好。

重新启动时,所有突然的hdfs fsck再次表示健康状况。

开始插入会在区域服务器日志中再次出现校验和错误(如下所示)。

最后我跑了

hdfs fsck / -delete

并且只有在重新启动所有内容后,插入才会再次运行。

下面有一些细节。问题是:

  1. 即使在一个单独的节点上,但只是试图将数据插入Hbase,HDFS怎么可能会被破坏?

  2. 我将来如何防止这种情况发生?

  3. 如果hbase hbck说一切正常,那是否意味着没有数据丢失?

  4. 详情:

    我检查了区域服务器,数据节点,namenode,hmaster,zookeeper的日志。没有内存或类似的东西。所有流程都一直在运行,并一直响应直到最后。没有磁盘空间不足。

    应用程序日志在05/16 08:31:32左右显示以下故障:

    
    java.lang.RuntimeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
    java.net.SocketTimeoutException: callTimeout=60000, callDuration=64720: row 'a899ca26-45a4-4af6-af34-72c035b4b7da/f6f88f36-9263-4dbb-9588-aaf126be9117/9223370573623181733' on table 'tt_items' at region=tt_items,a899ca26-45a4-4af6-af34-72c035b4b7da/b50993c1-7ff4-4169-b58f-f53878697709/9223370573954024736,1462900815686.08255086d13380bd559a87dd93cc15ba., hostname=hb-desktop,16201,1463231294049, seqNum=51380656
    Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
    
    java.net.SocketTimeoutException: callTimeout=60000, callDuration=64720: row 'a899ca26-45a4-4af6-af34-72c035b4b7da/f6f88f36-9263-4dbb-9588-aaf126be9117/9223370573623181733' on table 'tt_items' at region=tt_items,a899ca26-45a4-4af6-af34-72c035b4b7da/b50993c1-7ff4-4169-b58f-f53878697709/9223370573954024736,1462900815686.08255086d13380bd559a87dd93cc15ba., hostname=hb-desktop,16201,1463231294049, seqNum=51380656
    Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=64720: row 'a899ca26-45a4-4af6-af34-72c035b4b7da/f6f88f36-9263-4dbb-9588-aaf126be9117/9223370573623181733' on table 'tt_items' at region=tt_items,a899ca26-45a4-4af6-af34-72c035b4b7da/b50993c1-7ff4-4169-b58f-f53878697709/9223370573954024736,1462900815686.08255086d13380bd559a87dd93cc15ba., hostname=hb-desktop,16201,1463231294049, seqNum=51380656
    Caused by: java.io.IOException: java.io.IOException: Could not seek StoreFileScanner[HFileScanner for reader reader=hdfs://localhost:9000/hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/9bad8381ba89430daf008c28befedec3, compression=snappy, cacheConf=blockCache=LruBlockCache{blockCount=7248, currentSize=543459280, freeSize=15851184, maxSize=559310464, heapSize=543459280, minSize=531344928, minFactor=0.95, multiSize=265672464, multiFactor=0.5, singleSize=132836232, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false, firstKey=a899ca26-45a4-4af6-af34-72c035b4b7da/b5099cdb-2148-4f95-866a-f6eba2813bab/9223370573623181721/d:co/1463268017037/Put, lastKey=a899ca26-45a4-4af6-af34-72c035b4b7da/fffffc52-8969-472a-a17b-65a4edd6fa66/9223370573623181744/d:r.tt.EPC/1463255430913/Put, avgKeyLen=118, avgValueLen=29, entries=20184807, length=465347831, cur=null] to key a899ca26-45a4-4af6-af34-72c035b4b7da/f6f88f36-9263-4dbb-9588-aaf126be9117/9223370573623181733/d:/LATEST_TIMESTAMP/DeleteFamily/vlen=0/seqid=0
    Caused by: org.apache.hadoop.fs.ChecksumException: Checksum error: /hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/9bad8381ba89430daf008c28befedec3 at 140550656 exp: -935597690 got: -1004657115
    Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): java.io.IOException: Could not seek StoreFileScanner[HFileScanner for reader reader=hdfs://localhost:9000/hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/9bad8381ba89430daf008c28befedec3, compression=snappy, cacheConf=blockCache=LruBlockCache{blockCount=7248, currentSize=543459280, freeSize=15851184, maxSize=559310464, heapSize=543459280, minSize=531344928, minFactor=0.95, multiSize=265672464, multiFactor=0.5, singleSize=132836232, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false, firstKey=a899ca26-45a4-4af6-af34-72c035b4b7da/b5099cdb-2148-4f95-866a-f6eba2813bab/9223370573623181721/d:co/1463268017037/Put, lastKey=a899ca26-45a4-4af6-af34-72c035b4b7da/fffffc52-8969-472a-a17b-65a4edd6fa66/9223370573623181744/d:r.tt.EPC/1463255430913/Put, avgKeyLen=118, avgValueLen=29, entries=20184807, length=465347831, cur=null] to key a899ca26-45a4-4af6-af34-72c035b4b7da/f6f88f36-9263-4dbb-9588-aaf126be9117/9223370573623181733/d:/LATEST_TIMESTAMP/DeleteFamily/vlen=0/seqid=0
    

    大约在同一时间区域服务器说:

    
    2016-05-16 08:30:37,390 WARN  [B.defaultRpcServer.handler=8,queue=2,port=16201] hfile.HFile: HBase checksum verification failed for file hdfs://localhost:9000/hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/9bad8381ba89430daf008c28befedec3 at offset 408984102 filesize 465347831. Retrying read with HDFS checksums turned on...
    2016-05-16 08:30:37,433 WARN  [B.defaultRpcServer.handler=8,queue=2,port=16201] hdfs.DFSClient: Found Checksum error for BP-130837870-192.168.178.29-1462900512452:blk_1073746899_6086 from DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK] at 140550656
    2016-05-16 08:30:37,434 INFO  [B.defaultRpcServer.handler=8,queue=2,port=16201] hdfs.DFSClient: Could not obtain BP-130837870-192.168.178.29-1462900512452:blk_1073746899_6086 from any node: java.io.IOException: No live nodes contain block BP-130837870-192.168.178.29-1462900512452:blk_1073746899_6086 after checking nodes = [DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK] Dead nodes:  DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]. Will get new block locations from namenode and retry...
    2016-05-16 08:30:37,434 WARN  [B.defaultRpcServer.handler=8,queue=2,port=16201] hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 2146.2290363411184 msec.
    
    .. some repetition...
    
    2016-05-16 08:32:40,975 WARN  [B.defaultRpcServer.handler=26,queue=2,port=16201] hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 2767.123497826818 msec.
    2016-05-16 08:32:42,496 ERROR [regionserver/hb-desktop/192.168.178.29:16201-shortCompactions-1463232518856] regionserver.CompactSplitThread: Compaction failed Request = regionName=tt_items,a899ca26-45a4-4af6-af34-72c035b4b7da/b50993c1-7ff4-4169-b58f-f53878697709/9223370573954024736,1462900815686.08255086d13380bd559a87dd93cc15ba., storeName=d, fileCount=8, fileSize=1.0 G (443.8 M, 239.2 M, 124.8 M, 101.9 M, 78.5 M, 33.0 M, 32.9 M, 10.9 M), priority=2, time=152707213570012
    java.io.IOException: Could not iterate StoreFileScanner[HFileScanner for reader reader=hdfs://localhost:9000/hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/9bad8381ba89430daf008c28befedec3, compression=snappy, cacheConf=blockCache=LruBlockCache{blockCount=7130, currentSize=535445256, freeSize=23865208, maxSize=559310464, heapSize=535445256, minSize=531344928, minFactor=0.95, multiSize=265672464, multiFactor=0.5, singleSize=132836232, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false, firstKey=a899ca26-45a4-4af6-af34-72c035b4b7da/b5099cdb-2148-4f95-866a-f6eba2813bab/9223370573623181721/d:co/1463268017037/Put, lastKey=a899ca26-45a4-4af6-af34-72c035b4b7da/fffffc52-8969-472a-a17b-65a4edd6fa66/9223370573623181744/d:r.tt.EPC/1463255430913/Put, avgKeyLen=118, avgValueLen=29, entries=20184807, length=465347831, cur=a899ca26-45a4-4af6-af34-72c035b4b7da/f6f881d7-076a-4aec-82f0-17e33d8f4f84/9223370573623181728/d:pi/1463267434824/Put/vlen=73/seqid=0]
            at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:146)
            at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:108)
            at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:596)
            at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:273)
            at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:105)
            at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:124)
            at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1233)
            at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1770)
            at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:520)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            at java.lang.Thread.run(Thread.java:745)
    Caused by: org.apache.hadoop.fs.ChecksumException: Checksum error: /hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/9bad8381ba89430daf008c28befedec3 at 140550656 exp: -935597690 got: -1004657115
            at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(Native Method)
            at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:59)
            at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:301)
            at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:237)
            at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:156)
            at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:744)
            at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:800)
            at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:860)
            at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:903)
    

    Datanode说:

    
    2016-05-16 08:26:01,750 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:46618, dest: /127.0.0.1:50010, bytes: 114196960, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1749190019_1, offset: 0, srvID: 7bfbedce-3f6d-4f95-8eb4-d96438fedc36, blockid: BP-130837870-192.168.178.29-1462900512452:blk_1073747594_6781, duration: 14642101541
    2016-05-16 08:26:01,750 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-130837870-192.168.178.29-1462900512452:blk_1073747594_6781, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
    2016-05-16 08:26:17,445 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow flushOrSync took 304ms (threshold=300ms), isSync:false, flushTotalNanos=304174121ns
    2016-05-16 08:26:22,832 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-130837870-192.168.178.29-1462900512452:blk_1073747595_6782 src: /127.0.0.1:47502 dest: /127.0.0.1:50010
    2016-05-16 08:26:23,385 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:432ms (threshold=300ms)
    2016-05-16 08:26:24,684 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:47502, dest: /127.0.0.1:50010, bytes: 11485542, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1749190019_1, offset: 0, srvID: 7bfbedce-3f6d-4f95-8eb4-d96438fedc36, blockid: BP-130837870-192.168.178.29-1462900512452:blk_1073747595_6782, duration: 1851995254
    2016-05-16 08:26:24,685 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-130837870-192.168.178.29-1462900512452:blk_1073747595_6782, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
    2016-05-16 08:26:45,373 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:405ms (threshold=300ms)
    2016-05-16 08:27:16,631 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-130837870-192.168.178.29-1462900512452:blk_1073747596_6783 src: /127.0.0.1:49517 dest: /127.0.0.1:50010
    2016-05-16 08:27:17,729 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:49517, dest: /127.0.0.1:50010, bytes: 3455293, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1749190019_1, offset: 0, srvID: 7bfbedce-3f6d-4f95-8eb4-d96438fedc36, blockid: BP-130837870-192.168.178.29-1462900512452:blk_1073747596_6783, duration: 1096622112
    2016-05-16 08:27:17,729 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-130837870-192.168.178.29-1462900512452:blk_1073747596_6783, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
    
    ...some Slow BlockReceiver messages...
    
    org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/home/hb/seritrack-mts/nosql/data/data, DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017): no suitable block pools found to scan.  Waiting 1334756421 ms.
    2016-05-16 08:30:07,254 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:375ms (threshold=300ms)
    2016-05-16 08:31:32,604 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073747556_6743 file /home/hb/seritrack-mts/nosql/data/data/current/BP-130837870-192.168.178.29-1462900512452/current/finalized/subdir0/subdir22/blk_1073747556 for deletion
    
    ... some repetition...
    
    2016-05-16 08:31:39,276 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-130837870-192.168.178.29-1462900512452:blk_1073747597_6784 src: /127.0.0.1:59092 dest: /127.0.0.1:50010
    2016-05-16 08:31:40,380 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:59092, dest: /127.0.0.1:50010, bytes: 11480560, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1749190019_1, offset: 0, srvID: 7bfbedce-3f6d-4f95-8eb4-d96438fedc36, blockid: BP-130837870-192.168.178.29-1462900512452:blk_1073747597_6784, duration: 1093600168
    2016-05-16 08:31:40,380 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-130837870-192.168.178.29-1462900512452:blk_1073747597_6784, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
    2016-05-16 08:31:41,563 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-130837870-192.168.178.29-1462900512452:blk_1073747598_6785 src: /127.0.0.1:59147 dest: /127.0.0.1:50010
    2016-05-16 08:31:44,761 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:408ms (threshold=300ms)
    2016-05-16 08:31:48,238 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:411ms (threshold=300ms)
    2016-05-16 08:31:51,398 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/home/hb/seritrack-mts/nosql/data/data, DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017): no suitable block pools found to scan.  Waiting 1334619581 ms.
    2016-05-16 08:31:51,529 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:389ms (threshold=300ms)
    
    ...some more of those ...
    
    2016-05-16 08:31:57,547 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:59147, dest: /127.0.0.1:50010, bytes: 268435456, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1749190019_1, offset: 0, srvID: 7bfbedce-3f6d-4f95-8eb4-d96438fedc36, blockid: BP-130837870-192.168.178.29-1462900512452:blk_1073747598_6785, duration: 15718340373
    2016-05-16 08:31:57,547 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-130837870-192.168.178.29-1462900512452:blk_1073747598_6785, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
    2016-05-16 08:31:57,749 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-130837870-192.168.178.29-1462900512452:blk_1073747599_6786 src: /127.0.0.1:59440 dest: /127.0.0.1:50010
    2016-05-16 08:31:58,657 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:343ms (threshold=300ms)
    
    ...some more of those ...
    
    2016-05-16 08:32:13,559 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:59440, dest: /127.0.0.1:50010, bytes: 268435456, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1749190019_1, offset: 0, srvID: 7bfbedce-3f6d-4f95-8eb4-d96438fedc36, blockid: BP-130837870-192.168.178.29-1462900512452:blk_1073747599_6786, duration: 15809336101
    2016-05-16 08:32:13,559 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-130837870-192.168.178.29-1462900512452:blk_1073747599_6786, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
    2016-05-16 08:32:14,071 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-130837870-192.168.178.29-1462900512452:blk_1073747600_6787 src: /127.0.0.1:59678 dest: /127.0.0.1:50010
    2016-05-16 08:32:16,251 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:493ms (threshold=300ms)
    
    
    ...some more of those ...
    
    2016-05-16 08:32:29,087 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:59678, dest: /127.0.0.1:50010, bytes: 268435456, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1749190019_1, offset: 0, srvID: 7bfbedce-3f6d-4f95-8eb4-d96438fedc36, blockid: BP-130837870-192.168.178.29-1462900512452:blk_1073747600_6787, duration: 15015849046
    2016-05-16 08:32:29,087 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-130837870-192.168.178.29-1462900512452:blk_1073747600_6787, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
    2016-05-16 08:32:29,171 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-130837870-192.168.178.29-1462900512452:blk_1073747601_6788 src: /127.0.0.1:59907 dest: /127.0.0.1:50010
    2016-05-16 08:32:30,420 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:322ms (threshold=300ms)
    
    ...some more of those ...
    
    2016-05-16 08:32:42,447 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:59907, dest: /127.0.0.1:50010, bytes: 205882206, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1749190019_1, offset: 0, srvID: 7bfbedce-3f6d-4f95-8eb4-d96438fedc36, blockid: BP-130837870-192.168.178.29-1462900512452:blk_1073747601_6788, duration: 13275544186
    2016-05-16 08:32:42,447 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-130837870-192.168.178.29-1462900512452:blk_1073747601_6788, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
    2016-05-16 08:33:46,104 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Likely the client has stopped reading, disconnecting it (hb-desktop:50010:DataXceiver error processing READ_BLOCK operation  src: /127.0.0.1:46586 dst: /127.0.0.1:50010); java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:50010 remote=/127.0.0.1:46586]
    

    Namenode说:

    
    2016-05-16 08:30:39,662 INFO org.apache.hadoop.hdfs.StateChange: *DIR* reportBadBlocks
    2016-05-16 08:30:39,662 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_1073746899 to add as corrupt on 127.0.0.1:50010 by /127.0.0.1  because client machine reported it
    2016-05-16 08:30:41,844 INFO org.apache.hadoop.hdfs.StateChange: *DIR* reportBadBlocks
    2016-05-16 08:30:41,844 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_1073746899 to add as corrupt on 127.0.0.1:50010 by /127.0.0.1  because client machine reported it
    2016-05-16 08:30:43,495 INFO org.apache.hadoop.hdfs.StateChange: *DIR* reportBadBlocks
    2016-05-16 08:30:43,495 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_1073746899 to add as corrupt on 127.0.0.1:50010 by /127.0.0.1  because client machine reported it
    2016-05-16 08:30:47,435 INFO org.apache.hadoop.hdfs.StateChange: *DIR* reportBadBlocks
    2016-05-16 08:30:47,436 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_1073746899 to add as corrupt on 127.0.0.1:50010 by /127.0.0.1  because client machine reported it
    2016-05-16 08:30:51,842 INFO org.apache.hadoop.hdfs.StateChange: *DIR* reportBadBlocks
    2016-05-16 08:30:51,842 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_1073746899 to add as corrupt on 127.0.0.1:50010 by /127.0.0.1  because client machine reported it
    2016-05-16 08:30:57,476 INFO org.apache.hadoop.hdfs.StateChange: *DIR* reportBadBlocks
    2016-05-16 08:30:57,476 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_1073746899 to add as corrupt on 127.0.0.1:50010 by /127.0.0.1  because client machine reported it
    2016-05-16 08:31:06,539 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
    2016-05-16 08:31:06,539 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
    2016-05-16 08:31:08,643 INFO org.apache.hadoop.hdfs.StateChange: *DIR* reportBadBlocks
    2016-05-16 08:31:08,643 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_1073746899 to add as corrupt on 127.0.0.1:50010 by /127.0.0.1  because client machine reported it
    2016-05-16 08:31:21,129 INFO org.apache.hadoop.hdfs.StateChange: *DIR* reportBadBlocks
    2016-05-16 08:31:21,130 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_1073746899 to add as corrupt on 127.0.0.1:50010 by /127.0.0.1  because client machine reported it
    2016-05-16 08:31:28,993 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073747591_6778 127.0.0.1:50010
    2016-05-16 08:31:28,993 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 60 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 36 SyncTimes(ms): 11435
    2016-05-16 08:31:29,186 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073747556_6743 127.0.0.1:50010
    2016-05-16 08:31:29,321 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073747574_6761 127.0.0.1:50010
    2016-05-16 08:31:29,674 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073747593_6780 127.0.0.1:50010
    2016-05-16 08:31:29,713 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073747558_6745 127.0.0.1:50010
    2016-05-16 08:31:29,796 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073747575_6762 127.0.0.1:50010
    2016-05-16 08:31:30,237 INFO BlockStateChange: BLOCK* BlockManager: ask 127.0.0.1:50010 to delete [blk_1073747556_6743, blk_1073747558_6745, blk_1073747574_6761, blk_1073747575_6762, blk_1073747591_6778, blk_1073747593_6780]
    2016-05-16 08:31:32,007 INFO org.apache.hadoop.hdfs.StateChange: *DIR* reportBadBlocks
    2016-05-16 08:31:32,007 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_1073746899 to add as corrupt on 127.0.0.1:50010 by /127.0.0.1  because client machine reported it
    2016-05-16 08:31:36,540 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
    2016-05-16 08:31:36,540 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
    2016-05-16 08:31:38,849 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/.tmp/9652e091531943848ee523a60bc5baa5. BP-130837870-192.168.178.29-1462900512452 blk_1073747597_6784{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017:NORMAL:127.0.0.1:50010|RBW]]}
    2016-05-16 08:31:40,381 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_1073747597_6784{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017:NORMAL:127.0.0.1:50010|RBW]]} size 0
    2016-05-16 08:31:40,745 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/.tmp/9652e091531943848ee523a60bc5baa5 is closed by DFSClient_NONMAPREDUCE_-1749190019_1
    2016-05-16 08:31:41,294 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/.tmp/fb79c8ab90a5498089841431191c03ca. BP-130837870-192.168.178.29-1462900512452 blk_1073747598_6785{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017:NORMAL:127.0.0.1:50010|RBW]]}
    2016-05-16 08:31:57,547 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_1073747598_6785{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017:NORMAL:127.0.0.1:50010|RBW]]} size 0
    2016-05-16 08:31:57,551 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/.tmp/fb79c8ab90a5498089841431191c03ca. BP-130837870-192.168.178.29-1462900512452 blk_1073747599_6786{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017:NORMAL:127.0.0.1:50010|RBW]]}
    2016-05-16 08:32:06,539 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
    2016-05-16 08:32:06,539 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
    2016-05-16 08:32:13,559 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_1073747599_6786{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017:NORMAL:127.0.0.1:50010|RBW]]} size 0
    2016-05-16 08:32:13,875 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/.tmp/fb79c8ab90a5498089841431191c03ca. BP-130837870-192.168.178.29-1462900512452 blk_1073747600_6787{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017:NORMAL:127.0.0.1:50010|RBW]]}
    2016-05-16 08:32:29,087 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_1073747600_6787{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017:NORMAL:127.0.0.1:50010|RBW]]} size 0
    2016-05-16 08:32:29,088 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/.tmp/fb79c8ab90a5498089841431191c03ca. BP-130837870-192.168.178.29-1462900512452 blk_1073747601_6788{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017:NORMAL:127.0.0.1:50010|RBW]]}
    2016-05-16 08:32:29,088 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 91 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 57 SyncTimes(ms): 14475
    2016-05-16 08:32:36,539 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
    2016-05-16 08:32:36,539 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
    2016-05-16 08:32:42,043 INFO org.apache.hadoop.hdfs.StateChange: *DIR* reportBadBlocks
    2016-05-16 08:32:42,044 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_1073746899 to add as corrupt on 127.0.0.1:50010 by /127.0.0.1  because client machine reported it
    2016-05-16 08:32:42,447 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_1073747601_6788{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017:NORMAL:127.0.0.1:50010|RBW]]} size 0
    2016-05-16 08:32:42,495 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/.tmp/fb79c8ab90a5498089841431191c03ca is closed by DFSClient_NONMAPREDUCE_-1749190019_1
    

1 个答案:

答案 0 :(得分:1)

由于没有进一步的回应,因为看起来HDFS-8809确实是问题的根源 - 这更像是一个混乱而不是真正的问题,我建议作为答案。