使用zlib

时间:2018-04-27 06:46:36

标签: c++ string zlib compression

我需要解码以zlib格式存储的十六进制字符串。一个例子是:

  

1800000013000000eAFjYoAAZiDFCMQgGgQAAJwACg ==

其中18000000和13000000是未压缩/压缩数据的大小(在本例中为24和19)。

我也知道字符串的其余部分包含

  

020000000000000003000000010000000300000000000000

问题出在哪里?按照https://panthema.net/2007/0328-ZLibString.html之类的任何教程压缩我得到的字符串

  

X'302 @ 2 P ...

十六进制可以写成

  

783f3330324053f503f103ff5

这与我预期的压缩字符串无关,所以我没有找到解压缩原始字符串的方法(这是我的最终目标)

提前感谢您的任何提示!

PS。我正在使用解压缩例程 https://github.com/systemed/intersector/blob/master/helpers.cpp

看起来字符串已在base64上编码(谢谢@zdenek和@Mark-Adler)我设法用

解码它
BYTE *res;
int resSize = FromBase64Simple((BYTE*)actualData.c_str(),actualData.len(),res,sizeCompressed);

您可以阅读https://github.com/kengonakajima/luvit-base64/blob/master/base64.c

中的实施内容

但这不是问题,因为我可以使用

转储结果
char* resChar = new char[resSize];
for(int i = 0;i<resSize;i++)
{
    int asciiCode = (BYTE)res[i];
    resChar[i]=char(asciiCode);
    char buffer [2];
    itoa (asciiCode,buffer,16);
    qDebug()<<"["<<i<<"]\t"<<asciiCode<<"\t"<<buffer;
}

我得到十进制和十六进制的每个字节的结果,两者都没问题。十六进制看起来像:

  

78 01 63 62 80 00 66 20 c5 08 c4 20 1a 04 00 00 9c 00 0a

但resChar是&#34; x?cb?&#34;这与@ Mark-Adler所说的价值无关&#34; x?302 @?P ??&#34; (显然&#39;?&#39;符号不是可打印的符号),我真的认为这是问题,但我的数据似乎与此表一致:https://www.asciitable.com/和Mark的一个此网络https://conv.darkbyte.ru/也不会返回与我的算法相同的结果

我尝试使用上面描述的实现来解压缩字符串,但它失败了(也试过https://gist.github.com/arq5x/5315739)但是它的解压缩值是一个单字符串&#34;&#34;

这里我们采用最小可重复的案例:

#include <string>

static char LookupDigits[] = {
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, //gap: ctrl chars
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, //gap: ctrl chars
    0,0,0,0,0,0,0,0,0,0,0,           //gap: spc,!"#$%'()*
    62,                   // +
    0, 0, 0,             // gap ,-.
    63,                   // /
    52, 53, 54, 55, 56, 57, 58, 59, 60, 61, // 0-9
    0, 0, 0,             // gap: :;<
    99,                   //  = (end padding)
    0, 0, 0,             // gap: >?@
    0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15,16,
    17,18,19,20,21,22,23,24,25, // A-Z
    0, 0, 0, 0, 0, 0,    // gap: [\]^_`
    26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,
    43,44,45,46,47,48,49,50,51, // a-z
    0, 0, 0, 0,          // gap: {|}~ (and the rest...)
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
};

int FromBase64Simple(const unsigned char* pSrc, int nLenSrc, unsigned char* pDst, int nLenDst)
{
    int nLenOut = 0;
    for (int j = 0; j<nLenSrc; j += 4) {
        if (nLenOut > nLenDst) {
            return(0); // error, buffer too small
        }
        unsigned char s1 = LookupDigits[*pSrc++];
        unsigned char s2 = LookupDigits[*pSrc++];
        unsigned char s3 = LookupDigits[*pSrc++];
        unsigned char s4 = LookupDigits[*pSrc++];

        unsigned char d1 = ((s1 & 0x3f) << 2) | ((s2 & 0x30) >> 4);
        unsigned char d2 = ((s2 & 0x0f) << 4) | ((s3 & 0x3c) >> 2);
        unsigned char d3 = ((s3 & 0x03) << 6) | ((s4 & 0x3f) >> 0);

        *pDst++ = d1;  nLenOut++;
        if (s3 == 99) break;      // end padding found
        *pDst++ = d2;  nLenOut++;
        if (s4 == 99) break;      // end padding found
        *pDst++ = d3;  nLenOut++;
    }
    return(nLenOut);
}


int main()
{
    std::string inputData = "eAFjYoAAZiDFCMQgGgQAAJwACg==";


    //19 is hardcoded since I know its size prior to this call
    unsigned char res[19];
    int resSize = FromBase64Simple((unsigned char*)inputData.c_str(), inputData.size(), res, 19);


    for (int i = 0; i<resSize; i++)
    {
        int asciiCode = res[i];
        printf("[%i]\t%i\t%x\n", i, asciiCode, asciiCode);
    }
    printf("\n\nres: %s", (char*)res);

    getchar();

    return 0;
}

2 个答案:

答案 0 :(得分:2)

“eAFjYoAAZiDFCMQgGgQAAJwACg ==”是Base64编码的。您需要首先将其解码为二进制文件以获得可以解压缩的内容。以十六进制表示的二进制文件是:

78 01 63 62 80 00 66 20 c5 08 c4 20 1a 04 00 00 9c 00 0a

这是一个有效的zlib流解压缩到此,用十六进制表示:

02 00 00 00 00 00 00 00 03 00 00 00 01 00 00 00 03 00 00 00 00 00 00 00

压缩的结果“x?302 @?P ??”最初是二进制的,无法打印。那些问号不是原始的问号,而是其他一些不打印的字节。所以不要打印它。由此产生的将打印结果转换为十六进制的尝试不正确,因为十六进制中有问号(3f)。

答案 1 :(得分:2)

这对我来说很好。我使用了您链接的解压缩功能和您提供的base64功能。我删除了错误检查并重新格式化了一些东西以缩短它。

020000000000000003000000010000000300000000000000

输出:array1 = [ { rule: { column: "colName", value: "val1" } }, { rule: { column: "colName", value: "val2" } }, { rule: { column: "colName", value: "val3" } }, { rule: { column: "colName2", value: "val4" } } ]