有没有办法修复因未正确关闭boost :: archive :: binary_oarchive而损坏的文件?

时间:2017-06-15 01:03:08

标签: c++ boost binaryfiles

我运行了一个大型处理作业,它生成了大量的二进制文件作为输出。我想我现在意识到我的输出数据文件已损坏,因为在将数据文件移动到远程存储之前我没有正确刷新或关闭body对象。我想知道是否有任何方法可以通过附加一些特殊的EOF来修复输出数据文件,或者我是否运气不好而且我必须重新运行昂贵的工作?

更具体地说,我的处理作业转储二进制数据,如下所示:

boost::archive::binary_oarchive

我认为发生的是当我在本地进行测试(并且没有上传到远程)时, void dumpStuff() { // some code std::ofstream ofs(localFileName); boost::archive::binary_oarchive boa(ofs); boa << *data; if (uploadToRemote) { // code that uploads files to remote store // does not run when I tested locally } } 对象超出了boa功能末尾的范围,因此它就是&#39;调用析构函数,正确刷新流并关闭文件。但是,当上传到远程存储时,上传发生在dumpStuff的析构函数被调用之前,因此我认为该流未被正确刷新,从而导致文件损坏。当我从商店中获取损坏的文件并尝试使用oba加载时,我会收到boost::archive::binary_iarchive

我知道我可以通过在InputStreamError内添加一些大括号来强制它在上传到远程之前超出范围来解决问题,但是,这只会解决我的问题,如果我重新运行大昂贵的工作。所以,我的问题是,是否有一些简单的方法可以在我的损坏文件的末尾添加一些内容来破坏它们?某种EOF信号?

1 个答案:

答案 0 :(得分:2)

There may not be. Then again most of it will surely depend on the flushing behaviour of the underlying stream you used.

This is a one-off problem that only you have, so you will have to make a solution.

  • One way would be to look at the source code to figure out exactly what actions would be skipped due to the missing close. And then either compensate the missing input OR make the input-archive implementation more tolerant for corrupt/missing tails.

  • The other approach would be to use your own code WITH the flaw to write an archive, and then write the SAME archive but with the error fixed.

    Just look at the difference in a hex editor. You may be lucky and find that the data missing from the archive is fixed. If so, just append it to any corrupt input stream and be glad. More likely you will have some (simple) variable data, like a checksum or a total size. In that case either try to generate the missing data, or hack the input-stream implementation to detect the required checksum.

CAVEAT: All of these suggest meddling with undocumented details, there will not be support, reliability depends solely on your own accuracy.

If you choose to "fake" checksums, be aware of the fact that it thwarts any builtin error-detection, so you might still read unreliable data (in case there was data corrupted in sotrage/transit)