在hadoop程序中压缩映射输出结果异常

时间:2016-05-27 18:00:06

标签: java hadoop

在Hadoop程序中,我尝试压缩地图结果,我编写了以下代码:

conf.setBoolean("mapred.compress.map.output",true);
conf.setClass("mapred.map.output.compression.codec",GzipCodec.class,CompressionCodec.class); 

并运行它,我得到以下异常,有人知道原因吗?

WARN mapred.LocalJobRunner: job_local1149103367_0001 
java.io.IOException: not a gzip file  
at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.processBasicHeader(BuiltInGzipDecompressor.java:495)    
at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.executeHeaderState(BuiltInGzipDecompressor.java:256)
at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.decompress(BuiltInGzipDecompressor.java:185)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:72)   
at java.io.DataInputStream.readByte(DataInputStream.java:265)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
at org.apache.hadoop.mapred.IFile$Reader.positionToNextRecord(IFile.java:400)
at org.apache.hadoop.mapred.IFile$Reader.nextRawKey(IFile.java:425)
at org.apache.hadoop.mapred.Merger$Segment.nextRawKey(Merger.java:323)
at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:613)
at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:558)
at org.apache.hadoop.mapred.Merger.merge(Merger.java:70)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:385)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:445)

今天,我再次测试它,我发现如果在创建作业对象之前放置2行,

Job job = new Job(conf, "MyCounter"); 

错误将发生,如果在此之后,不会发生错误,为什么会发生这种情况?

1 个答案:

答案 0 :(得分:0)

您使用的是MRv1还是MRv2。如果您使用的是MRv2,请使用以下作业配置。

config.setBoolean("mapreduce.output.fileoutputformat.compress", true); config.setClass("mapreduce.output.fileoutputformat.compress.codec",GzipCodec.class,CompressionCodec.class);

另外你可以设置

config.set("mapreduce.output.fileoutputformat.compress.type",CompressionType.NONE.toString());

BLOCK | NONE | RECORD是三种压缩类型。