使用apache common compress / org.tukaani.xz

时间:2017-07-20 11:46:31

标签: java apache lzma compression

尝试解码LZMA压缩xls文件时出现 org.tukaani.xz.UnsupportedOptionsException:未压缩的大小太大错误。而非LZMA文件解压缩/解码没有任何问题。这两种情况都是相同的xls文件被压缩。

我正在使用Apache commons compress和org.tukaani.xz。

示例代码供参考

package com.concept.utilities.zip;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;

import org.apache.commons.compress.archivers.zip.ZipArchiveEntry;
import org.apache.commons.compress.archivers.zip.ZipFile;
import org.apache.commons.compress.compressors.lzma.LZMACompressorInputStream;

public class ApacheComm {

    public void extractLZMAZip(File zipFile, String compressFileName, String destFolder) {

        ZipFile zip = null;
        try {

            zip = new ZipFile(zipFile);
            ZipArchiveEntry zipArchiveEntry = zip.getEntry(compressFileName);
            if (null != zipArchiveEntry) {
                String name = zipArchiveEntry.getName();

                // InputStream is = zip.getInputStream(zipArchiveEntry);
                InputStream israw = zip.getRawInputStream(zipArchiveEntry);

                LZMACompressorInputStream lzma = new LZMACompressorInputStream(israw);
            }

        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (null != zip)
                ZipFile.closeQuietly(zip);
        }
    }

    public static void main(String[] args) throws IOException {

        ApacheComm c = new ApacheComm();
        try {
            c.extractLZMAZip(new File("H:\\archives\\rollLZMA.zip"), "roll.xls", "H:\\archives\\");
        } catch (Exception e) {
            e.printStackTrace();
        }

    }

}

错误

org.tukaani.xz.UnsupportedOptionsException: Uncompressed size is too big
    at org.tukaani.xz.LZMAInputStream.initialize(Unknown Source)
    at org.tukaani.xz.LZMAInputStream.<init>(Unknown Source)
    at org.apache.commons.compress.compressors.lzma.LZMACompressorInputStream.<init>(LZMACompressorInputStream.java:50)
    at com.concept.utilities.zip.ApacheComm.extractLZMAZip(ApacheComm.java:209)
    at com.concept.utilities.zip.ApacheComm.main(ApacheComm.java:224)

我错过了什么吗?有没有其他方法可以用压缩方法= LZMA

解码 zip文件

1 个答案:

答案 0 :(得分:3)

您的代码无法正常工作的原因是,与普通的压缩LZMA文件相比,Zip LZMA压缩数据段的标头有所不同。

您可以在https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT(4.4.4通用位标志,5.8 LZMA-方法14)上阅读规范,但请引用重要的部分:

5.8.5 [...] LZMA压缩数据段将由LZMA属性标题和紧随其后的LZMA压缩数据组成,如下所示:

[LZMA properties header for file 1]
[LZMA compressed data for file 1]

[...]

5.8.8 LZMA属性标题中的属性信息的存储字段如下:

LZMA Version Information 2 bytes
LZMA Properties Size 2 bytes
LZMA Properties Data variable, defined by "LZMA Properties Size"

5.8.8.1 LZMA版本信息-此字段标识用于压缩文件的LZMA SDK版本。第一个字节将存储LZMA SDK的主版本号,第二个字节将存储次要号。

5.8.8.2 LZMA属性大小-此字段定义剩余属性数据的大小。通常,此大小应由SDK的版本确定。包含此大小字段是为了方便,并避免将来由于此压缩算法的更改而引起的任何歧义。

5.8.8.3 LZMA属性数据-此可变大小的字段记录了LZMA SDK定义的解压缩器所需的值。应当使用“ LZMA版本信息”字段定义的SDK版本中的WriteCoderProperties()获取存储在此字段中的数据。

代码示例:

import org.apache.commons.compress.archivers.zip.ZipArchiveEntry;
import org.apache.commons.compress.archivers.zip.ZipFile;
import org.apache.commons.compress.archivers.zip.ZipMethod;
import org.apache.commons.io.IOUtils;
import org.tukaani.xz.LZMAInputStream;

import java.io.IOException;
import java.io.InputStream;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;

public class ApacheComm
{
    public InputStream getInputstreamForEntry(ZipFile zipFile, ZipArchiveEntry ze) throws IOException
    {
        if (zipFile.canReadEntryData(ze))
        {
            return zipFile.getInputStream(ze);
        } else if (ze.getMethod() == ZipMethod.LZMA.getCode()) {
            InputStream inputStream = zipFile.getRawInputStream(ze);
            ByteBuffer buffer = ByteBuffer.wrap(IOUtils.readFully(inputStream, 9))
                    .order(ByteOrder.LITTLE_ENDIAN);

            // Lzma sdk version used to compress this data
            int majorVersion = buffer.get();
            int minorVersion = buffer.get();

            // Byte count of the following data represent as an unsigned short.
            // Should be = 5 (propByte + dictSize) in all versions
            int size = buffer.getShort() & 0xffff;
            if (size != 5)
                throw new UnsupportedOperationException();

            byte propByte = buffer.get();

            // Dictionary size is an unsigned 32-bit little endian integer.
            int dictSize = buffer.getInt();

            long uncompressedSize;
            if ((ze.getRawFlag() & (1 << 1)) != 0)
            {
                // If the entry uses EOS marker, use -1 to indicate
                uncompressedSize = -1;
            } else {
                uncompressedSize = ze.getSize();
            }

            return new LZMAInputStream(inputStream, uncompressedSize, propByte, dictSize);
        } else {
            throw new UnsupportedOperationException();
        }
    }
}