Question

我们的一个产品涉及一个具有以下结构的文件：

A STRING WITH SOME CONTENT IDENTIFYING THE FILES CONTENTS
A STRING ON ROW 2
A STRING ON ROW 3
A STRING ON ROW 4
<binary data starts here and is gzipped>

现在，如果我这样做，我可以解压缩内容并重新创建同一文件的未压缩版本：

INPUT=FILEA.COMPRESSED
OUTPUT=FILEB.UNCOMPRESSED
head -n5 $INPUT > $OUTPUT
cat $INPUT | tail --lines=+5 | gunzip >> $OUTPUT

# At this point I'm left with a file structure as follows:
A STRING WITH SOME CONTENT IDENTIFYING THE FILES CONTENTS
A STRING ON ROW 2
A STRING ON ROW 3
A STRING ON ROW 4
<uncompressed content>

我试图通过提升来完成同样的壮举。 Boost总是抛出 gzip_error 代码 4 ，其中 gzip.hpp 显示为 bad_header 。

毫无疑问，我工作的文件不是防弹的，而是由一个非常古老的遗留系统制作的。

我的主要问题：如果gunzip能够做到这一点......是否有一个调整或标志，我可以通过提升来进行调整，也可以让它做到这一点？

失败的C ++代码看起来像这样（大大简化了关注点，因此它可能包含语法错误）：

#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>
#include <sstream>
#include <iostream>
#include <fstream>

// Open File
ifstream file("myfile", ios::in|ios::binary);

int line = 1;
char c;
while (!file.eof() && line < 5){
   // I do do 'way' more error checking and proper handling here
   // in real code, but you get the point.. I'm moving the cursor
   // past the last new line and the beginning of what is otherwise
   // compressed content.
   file.get(c);
   if(c == '\n')line++;
}

stringstream ss;
// Store rest of binary data into stringstream
while(!file.eof()){
   file.get(c);
   ss.put(c);
}
// Close File
file.close();

// Return file pointer to potential gzip stream
ss.seekg(0, ios::beg);
try
{
   stringstream gzipped(ss.str());
   io::filtering_istream gunzip;
   gunzip.push(io::gzip_decompressor());
   gunzip.push(gzipped);
   copy(gunzip, ss);
}
catch(io::gzip_error const&  ex)
   // always throws error code 4 here (bad_header)
   cout << "Exception: " << ex.error() << endl;

以下是一些可能有用的有用信息：

操作系统： Redhat 5.7
提升： boost-1.33.1-10（el5存储库）
平台： x86_64
GCC：版本4.1.2 20080704（Red Hat 4.1.2-46）

我的Makefile在链接器中也有以下几行：

LDFLAGS = -lz -lboost_iostreams

Answer 1

我不确定这是否是导致错误的根本原因，但您对file.eof()的使用不正确。只有在您尝试读取文件末尾之后，该函数才会返回true。如果您的下次阅读失败，它不会通知您。

while(!file.eof()){ //1
   file.get(c);  // 2
   ss.put(c);    // 3
}

在此循环中，如果您读取第2行的最后一个有效字符，则将其输出为3.然后再次测试第1行的条件。由于您尚未尝试读取文件末尾的过去，file.eof()返回false，因此循环条件为true。然后它尝试读取下一个失败的字符，使c保持不变。然后第3行将该无效字符放入ss。

这会在流的末尾产生额外的字符。我不确定这是否是唯一的问题，但它可能就是其中之一。

编辑：

好的，看了之后，我不是百分之百确定为什么会发生这种情况，但这是因为你正在重复使用stringstream ss。在执行复制之前调用ss.seekp(0, ios::begin)，或使用单独的字符串流。

就个人而言，我不会将ss复制到gzipped，而是直接从输入文件中写入gzipped，然后通过副本输出到ss中。

提升gzip_decompressor失败在gunzip成功的地方

1 个答案: