我在路径'test / test.txt'中的hdfs上有一个1.3G的文件
ls和du命令的输出是:
hadoop fs -du test/test.txt
- > 1379081672 test/test.txt
hadoop fs -ls test/test.txt
- >
Found 1 items
-rw-r--r-- 3 testuser supergroup 1379081672 2014-05-06 20:27 test/test.txt
我想在此文件上运行mapreduce作业,但是当我在此文件上启动mapreduce作业时,作业失败并出现以下错误:
hadoop jar myjar.jar test.TestMapReduceDriver test output
14/05/29 16:42:03 WARN mapred.JobClient: Use GenericOptionsParser for parsing the
arguments. Applications should implement Tool for the same.
14/05/29 16:42:03 INFO input.FileInputFormat: Total input paths to process : 1
14/05/29 16:42:03 INFO mapred.JobClient: Running job: job_201405271131_9661
14/05/29 16:42:04 INFO mapred.JobClient: map 0% reduce 0%
14/05/29 16:42:17 INFO mapred.JobClient: Task Id : attempt_201405271131_9661_m_000004_0, Status : FAILED
java.io.IOException: Cannot obtain block length for LocatedBlock{BP-428948818-namenode-1392736828725:blk_-6790192659948575136_8493225; getBlockSize()=36904392; corrupt=false; offset=1342177280; locs=[datanode4:50010, datanode3:50010, datanode1:50010]}
at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:319)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:263)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:205)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:198)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1117)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:249)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:746)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:83)
at org.apache.hadoop.mapred.Ma`
我尝试了以下命令:
hadoop fs -cat test/test.txt
出现以下错误
cat: Cannot obtain block length for LocatedBlock{BP-428948818-10.17.56.16-1392736828725:blk_-6790192659948575136_8493225; getBlockSize()=36904392; corrupt=false; offset=1342177280; locs=[datanode3:50010, datanode1:50010, datanode4:50010]}
另外我无法复制文件hadoop fs -cp test/test.txt tmp
给出同样的错误:
cp: Cannot obtain block length for LocatedBlock{BP-428948818-10.17.56.16-1392736828725:blk_-6790192659948575136_8493225; getBlockSize()=36904392; corrupt=false; offset=1342177280; locs=[datanode1:50010, datanode3:50010, datanode4:50010]}
hdfs fsck /user/testuser/test/test.txt
命令的输出:
Connecting to namenode via `http://namenode:50070`
FSCK started by testuser (auth:SIMPLE) from /10.17.56.16 for path
/user/testuser/test/test.txt at Thu May 29 17:00:44 EEST 2014
Status: HEALTHY
Total size: 0 B (Total open files size: 1379081672 B)
Total dirs: 0
Total files: 0 (Files currently being written: 1)
Total blocks (validated): 0 (Total open file blocks (not validated): 21)
Minimally replicated blocks: 0
Over-replicated blocks: 0
Under-replicated blocks: 0
Mis-replicated blocks: 0
Default replication factor: 3
Average block replication: 0.0
Corrupt blocks: 0
Missing replicas: 0
Number of data-nodes: 5
Number of racks: 1
FSCK ended at Thu May 29 17:00:44 EEST 2014 in 0 milliseconds
The filesystem under path /user/testuser/test/test.txt is HEALTHY
顺便说一下,我可以从网络浏览器中看到test.txt文件的内容。
hadoop版本是:Hadoop 2.0.0-cdh4.5.0
答案 0 :(得分:1)
我也遇到了同样的问题,并通过以下步骤解决了该问题。 有些文件是通过水槽打开的,但从未关闭过(我不确定您的原因)。 您需要通过以下命令找到打开的文件的名称:
hdfs fsck /directory/of/locked/files/ -files -openforwrite
您可以尝试通过命令恢复文件:
hdfs debug recoverLease -path <path-of-the-file> -retries 3
或通过以下命令将其删除:
hdfs dfs -rmr <path-of-the-file>
答案 1 :(得分:0)
您有一些损坏的文件,在datanode上没有块,但在namenode中有一个条目。最好这样做:
答案 2 :(得分:0)
根据this,这可能是由完整的磁盘问题引起的。我最近遇到了一个旧文件同样的问题,并且在创建该文件期间检查我的服务器指标实际上是一个完整的磁盘问题。大多数解决方案只是声称删除该文件并捕获它不再发生。
答案 3 :(得分:0)
我有同样的错误,但这不是由于完整的磁盘问题,而且我认为是反向的,其中在namenode中引用的文件和块在任何datanode上都不存在。
因此,hdfs dfs -ls
显示文件,但对它们的任何操作都会失败,例如hdfs dfs -copyToLocal
。
就我而言,困难的部分是隔离哪些文件被列出但已损坏,因为它们存在于具有数千个文件的树中。奇怪的是,hdfs fsck /path/to/files/
没有报告任何问题。
我的解决方案是:
copyToLocal
隔离该位置,导致copyToLocal: Cannot obtain block length for LocatedBlock{BP-1918381527-10.74.2.77-1420822494740:blk_1120909039_47667041; getBlockSize()=1231; corrupt=false; offset=0; locs=[10.74.2.168:50010, 10.74.2.166:50010, 10.74.2.164:50010]}
多个文件ls -1 > baddirs.out
copyToLocal
for files in
cat baddirs.out ;do echo $files; hdfs dfs -copyToLocal $files
这将生成目录检查列表以及找到文件的错误。hdfs dfs -rm <file>
。copyToLocal
。一个简单的两小时过程!