压缩字符串长度略大于原始字符串长度?

时间:2014-01-04 10:38:41

标签: java string compression

我已经创建了一个使用充气机和平衡器压缩和解压缩字符串的应用程序。当在简单的单词语句上测试并压缩成长度小于原始字符串长度的字符串时,应用程序正常工作,但问题是当我尝试使用加密字符串时,我得到的压缩字符串长度略大于原始字符串长度。
谁能告诉我一些解决方案?

    private String compress(String stringToCompress) throws UnsupportedEncodingException
    {
        byte[] compressedData = new byte[1024];
        byte[] stringAsBytes = stringToCompress.getBytes("UTF-8");

        Deflater compressor = new Deflater();
        compressor.setInput(stringAsBytes);
        compressor.finish();
        int compressedDataLength = compressor.deflate(compressedData);

        byte[] bytes = Arrays.copyOf(compressedData, compressedDataLength);
        return Base64.encodeBase64String(bytes);
    }

    private String decompressToString(String base64String) throws UnsupportedEncodingException, DataFormatException
    {
        byte[] compressedData = Base64.decodeBase64(base64String);

        Inflater deCompressor = new Inflater();
        deCompressor.setInput(compressedData, 0, compressedData.length);
        byte[] output = new byte[102400];
        int decompressedDataLength = deCompressor.inflate(output);
        deCompressor.end();

        return new String(output, 0, decompressedDataLength, "UTF-8");
    }

:当

public static void main(String[] args)    
        {
         Sample_class = new Sample_class();
         S String strToBeCompressed  = "Pehla nasha Pehla khumaar Naya pyaar hai nayaPehla nasha Pehla khumaar Naya pyaar hai nayaPehla nasha Pehla khumaar Naya pyaar hai nayaPehla nasha Pehla khumaar Naya pyaar hai naya Pehla nasha Pehla khumaar Naya pyaar hai naya Pehla nasha Pehla khumaar Naya pyaar hai naya Pehla nasha Pehla khumaar Naya pyaar hai naya Pehla nasha Pehla khumaar Naya pyaar hai naya Pehla nasha Pehla khumaar Naya pyaar hai naya Pehla nasha Pehla khumaar Naya pyaar hai naya Pehla nasha Pehla khumaar Naya pyaar hai naya Pehla nasha Pehla khumaar Naya pyaar hai naya Pehla nasha Pehla khumaar Naya pyaar hai naya Pehla nasha Pehla khumaar Naya pyaar hai naya intezaar Kar loon main kPehla nasha Pehla khumaar Naya pyaar hai naya intezaar Kar loon main kya apna haal Aye dil-e-bekaraar Mere dil-e-bekaraar Tu hi bata Pehla nasha Pehla khumaar Udta hi firoon in hawaon mein kahin Ya main jhool jaoon in ghataon mein kahin Udta hi firoon in hawaon mein kahin Ya main jhool jaoon in ghataon mein kahin ";

         System.out.println(strToBeCompressed);
         String compressedData  = m.compress(strToBeCompressed);
         String deCompressedString = m.decompressToString(compressedData);
         System.out.println("Original     :: " + strToBeCompressed.length());
         System.out.println("Compressed   :: " + compressedData.toString().length());
         System.out.println("decompressed :: " + deCompressedString.length());
        }

输出

Pehla nasha Pehla khumaar Naya pyaar hai nayaPehla nasha Pehla khumaar Naya pyaar hai nayaPehla nasha Pehla khumaar Naya pyaar hai nayaPehla nasha Pehla khumaar Naya pyaar hai naya Pehla nasha Pehla khumaar Naya pyaar hai naya Pehla nasha Pehla khumaar Naya pyaar hai naya Pehla nasha Pehla khumaar Naya pyaar hai naya Pehla nasha Pehla khumaar Naya pyaar hai naya Pehla nasha Pehla khumaar Naya pyaar hai naya Pehla nasha Pehla khumaar Naya pyaar hai naya Pehla nasha Pehla khumaar Naya pyaar hai naya Pehla nasha Pehla khumaar Naya pyaar hai naya Pehla nasha Pehla khumaar Naya pyaar hai naya Pehla nasha Pehla khumaar Naya pyaar hai naya intezaar Kar loon main kPehla nasha Pehla khumaar Naya pyaar hai naya intezaar Kar loon main kya apna haal Aye dil-e-bekaraar Mere dil-e-bekaraar Tu hi bata Pehla nasha Pehla khumaar Udta hi firoon in hawaon mein kahin Ya main jhool jaoon in ghataon mein kahin Udta hi firoon in hawaon mein kahin Ya main jhool jaoon in ghataon mein kahin 
Original     :: 980
Compressed   :: 212
decompressed :: 980

但是

    public static void main(String[] args)    
    {
     Sample_class = new Sample_class();
     String strToBeCompressed = "iVBORw0KGgoAAAANSUhEUgAAAMUAAADFCAIAAABrWqGnAACAAElEQVR42uy9aYxk2XXn14BswNIY8BiGP3lgD7x8GBvjD5IM2TOaMWRbguERaGOokTSSRqJIUWKTItkkm0uT3c1e2c2q6q6qrr0q933P2Pc94r14+77vW+yRa2VVZVZmVvq+yO5ik5LGGktiF4EJHDy8jIqqioj3e//zP/eee/M50zT39vbORo/Dw8Ojo6MnT56cfWKP07MTEMdhPDk+Ozt+Esbpydnp0dnx47OTk7MzEE9GcXb65Al48ePHZ6enZ6fHYTw8ONwaPGi17vv+tmGAGOg6iL5hgOgaVsc02qbiapzFkwaLOxwdSHxXUfqaZpKUSdM2L7YMs+P6vuu5tuNZZltS+5IyVDRwdEhabsAqjNgENTTAH8ktUeoqaltRXJ4H0ZYkE6qrlYJSKej1stGogKNLNPsyqzWrKlJ1WGxoSru+PrCkrs63TSlo2U7L1j0dhN1zBw+2Ds6ODs9OD8/OHoafGnyq8MOCD3p+WcDVOT09PTk5AcenV+oTvGTn//X5u3r99def29/fPz4+Pn8KnIBnP8n3d/7NnYzg+HGeRk+dhjw9OT0J3zxg6En4+vDk9Ozw6Pj+3kG3u+U5AKAQkVH0VBWwcs5TRzc7ug548jXeFzkQbVnoafKWroPwGc7nhZaktnSjY7kt23VNy5EVQEyHF3uiDHhq86JHswHL9yTFpRiDIAFh50h1ZPB/abuWZTchpZzn82m5nD9HykTqHonA8XUkHeFreYBUT+MHhrhlyTuB1Wm7Qdu1A8v0DICU0basjmf1/IPTx/dPjg6fnJzzBD7rSfix//IrehLeaM8GT0+ffQoTeHxi7++cj5MRIk9OR1/j6TlP58oEnj0Ob86TUJnCF5+dPT55fHBwvz8Y+C7AxZflQBA8juvJ8ognra/pfd0C0dFtgFRP17qafE7bQA6jL0gdlm/RHCBmoGiAjK5q9GynbZhArgA9HkW7JNXm+K4gdngBRF+SA4Z1";

     System.out.println(strToBeCompressed);
     String compressedData  = m.compress(strToBeCompressed);
     String deCompressedString = m.decompressToString(compressedData);
     System.out.println("Original     :: " + strToBeCompressed.length());
     System.out.println("Compressed   :: " + compressedData.toString().length());
     System.out.println("decompressed :: " + deCompressedString.length());
    }

输出

iVBORw0KGgoAAAANSUhEUgAAAMUAAADFCAIAAABrWqGnAACAAElEQVR42uy9aYxk2XXn14BswNIY8BiGP3lgD7x8GBvjD5IM2TOaMWRbguERaGOokTSSRqJIUWKTItkkm0uT3c1e2c2q6q6qrr0q933P2Pc94r14+77vW+yRa2VVZVZmVvq+yO5ik5LGGktiF4EJHDy8jIqqioj3e//zP/eee/M50zT39vbORo/Dw8Ojo6MnT56cfWKP07MTEMdhPDk+Ozt+Esbpydnp0dnx47OTk7MzEE9GcXb65Al48ePHZ6enZ6fHYTw8ONwaPGi17vv+tmGAGOg6iL5hgOgaVsc02qbiapzFkwaLOxwdSHxXUfqaZpKUSdM2L7YMs+P6vuu5tuNZZltS+5IyVDRwdEhabsAqjNgENTTAH8ktUeoqaltRXJ4H0ZYkE6qrlYJSKej1stGogKNLNPsyqzWrKlJ1WGxoSru+PrCkrs63TSlo2U7L1j0dhN1zBw+2Ds6ODs9OD8/OHoafGnyq8MOCD3p+WcDVOT09PTk5AcenV+oTvGTn//X5u3r99def29/fPz4+Pn8KnIBnP8n3d/7NnYzg+HGeRk+dhjw9OT0J3zxg6En4+vDk9Ozw6Pj+3kG3u+U5AKAQkVH0VBWwcs5TRzc7ug548jXeFzkQbVnoafKWroPwGc7nhZaktnSjY7kt23VNy5EVQEyHF3uiDHhq86JHswHL9yTFpRiDIAFh50h1ZPB/abuWZTchpZzn82m5nD9HykTqHonA8XUkHeFreYBUT+MHhrhlyTuB1Wm7Qdu1A8v0DICU0basjmf1/IPTx/dPjg6fnJzzBD7rSfix//IrehLeaM8GT0+ffQoTeHxi7++cj5MRIk9OR1/j6TlP58oEnj0Ob86TUJnCF5+dPT55fHBwvz8Y+C7AxZflQBA8juvJ8ognra/pfd0C0dFtgFRP17qafE7bQA6jL0gdlm/RHCBmoGiAjK5q9GynbZhArgA9HkW7JNXm+K4gdngBRF+SA4Z1
Original     :: 1032
Compressed   :: 1076
decompressed :: 1032

3 个答案:

答案 0 :(得分:1)

数据压缩利用了可以使用较短模式编码的数据中的模式。这是基于模式存在的假设。如果没有找到模式,完美的压缩方案将保持数据不变。不幸的是,大多数压缩方案将添加少量元数据以至少定义例如方案。您所看到的可能是这种情况。

想象一下,尝试压缩已经压缩的数据。当然你不会指望这会进一步减小它的大小。

答案 1 :(得分:1)

没有问题需要解决。

您提供的是Base-64编码的输入。这应该是可压缩的,实际上它是可压缩的。你应该看到那些1032 Base-64字符被压缩到大约800字节(compressedDataLength)。

然后你转过身并再次通过Base-64编码二进制压缩数据来扩展它,回到你开始的地方,稍微扩展一下。

Base-64编码将始终将数据扩展4/3倍。因此,如果您没有获得至少3/4的压缩因子,那么您将看到整体扩展。

此外,即使没有Base-64编码,也会有通过压缩扩展的数据,例如已经压缩的数据。对于某些输入,无损压缩是不可避免的。

答案 2 :(得分:0)

你是base64编码缩小的字节。这可以添加更多的东西。使用这些东西的最好方法是byte []和snot字符串。转换为字节,将它们流式传输到您想要的位置,并从byte []重新构建资源/字符串。