我想压缩HDFS中当前存在的文件并删除未压缩的文件。这是代码,但它出现IOException错误。关于为什么会发生这种情况的任何指示?
CompressionCodecFactory ccf = new CompressionCodecFactory(conf);
CompressionCodec codec = ccf.getCodecByClassName(GzipCodec.class.getName());
InputStream inpStrm = codec.createInputStream(fs.open(infoFilePath));
OutputStream compressedOutputSream = codec.createOutputStream(fs.create( new Path( infoFile + "." + codec.getDefaultExtension()) ));
IOUtils.copyBytes(inpStrm, compressedOutputSream, conf);
但是它出错了IOException:
Exception in thread "main" java.io.IOException: incorrect header check
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:228)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:78)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
答案 0 :(得分:0)
除了新创建的文件之外,可能还会压缩一些看似压缩的文件。我有这个问题,当扩展名为.gz的文件是纯文本文件时。不同的Hadoop版本处理不同。显然,涉及文件名。