Question

我将收到的GZIPped数据解压缩为字符串。当BUFFER_SIZE为512时，它会在缓冲区限制点处破坏unicode字符时出现问题。结果我得到带问号的文字。它发生在非拉丁字母上。

...во и ��ргуме...

public static String decompress(byte[] compressed) throws IOException {
        final int BUFFER_SIZE = 512;
        ByteArrayInputStream is = new ByteArrayInputStream(compressed);
        GZIPInputStream gis = new GZIPInputStream(is, BUFFER_SIZE);
        StringBuilder string = new StringBuilder();
        byte[] data = new byte[BUFFER_SIZE];
        int bytesRead;
        while ((bytesRead = gis.read(data)) != -1) {
            string.append(new String(data, 0, bytesRead));
        }
        gis.close();
        is.close();
        return string.toString();
    }

Answer 1

错误在算法中，假设正在读取的块在UTF-8字节序列边界上结束（并开始）。

所以这样做：

    ByteArrayInputStream is = new ByteArrayInputStream(compressed);
    GZIPInputStream gis = new GZIPInputStream(is, BUFFER_SIZE);
    byte[] data = new byte[BUFFER_SIZE];
    int bytesRead;
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    while ((bytesRead = gis.read(data)) != -1) {
        baos.write(data, 0, bytesRead);
    }
    gis.close();
    is.close();
    return baos.toString("UTF-8");

Answer 2

您可以将// In MVC Filter HttpCookie cookie = filterContext.HttpContext.Request.Cookies.Get("AppSettings"); // Otherwise HttpCookie cookie = Request.Cookies.Get("AppSettings"); String value = cookie.Values["key"];包装成GZIPInputStream并读取字符而不是字节。通过这样做，您不会遇到缓冲区边界可能无效编码的问题。

Android GZIP解压缩会在缓冲区限制时中断unicode字符

2 个答案: