Question

我正在使用此代码来检测字符编码为'$ @'的双字节分隔符的位置。当我运行代码时，我发现双字节分隔符，但偏移量比文件中实际字节的位置小12。我的算法有问题吗？

#include <cstdint>    
#include <iostream>
#include <fstream>
#include <string>

int find_delim_offset(std::string file) {
    std::ifstream infile(file, std::ifstream::binary);
    int offset = 0;
    char test;
    char delim[2]; delim[0] = '$'; delim[1] = '@'; // delimiter is '$@'
    bool found_delim = false;
    while(infile) {
        infile >> test;
        if(test == delim[0]) {
            infile >> test;
            if(test == delim[1]) {
                found_delim = true;
                break;
            }
            ++offset;
        }
        ++offset;
    }
    return offset;
}

我使用VIM获取文件的十六进制摘要（Septentrio二进制格式文件，对于那些关心的人）。它在字节142和143处显示第一个'$ @'：

0000000: 0c00 7607 57de 0d00 09f0 89fc 3bd4 0e18  ..v.W.......;...
0000010: 8af0 7efc 80ea 1000 0ef0 bdf6 cfc9 1108  ..~.............
0000020: 280f b909 2a8f 110b 28ff a7fc 1da4 1200  (...*...(.......
0000030: 04f1 16fe 6ce8 1308 3eff 78fa cdb8 130b  ....l...>.x.....
0000040: 3e0f 850b e7ab 1408 27f0 88f9 628d 140b  >.......'...b...
0000050: 27ff cbfb bd60 1508 3700 5d0b 9e83 150b  '....`..7.].....
0000060: 370f c604 1937 1708 3800 e901 aae1 170b  7....7..8.......
0000070: 380f 6e05 f157 1818 8500 3d00 94be 1908  8.n..W....=.....
0000080: 39f0 69fa 8fe1 190b 3900 8707 6293 2440  9.i.....9...b.$@ <--- here
0000090: 51a7 bb2f 7003 30a3 010d ba06 1614 0c05  Q../p.0.........
00000a0: 0400 0100 1405 234e 6f9f 83b4 cdff 9304  ......#No.......
00000b0: 0071 ed00 0002 02e4 57ff 0000 25f9 fb0e  .q......W...%...
00000c0: dfff 01ff 5787 8000 3afd 0000 0000 0208  ....W...:.......
00000d0: 3004 0bac f8f6 5c22 2f02 f808 0090 f200  0.....\"/.......
00000e0: 4001 0bf2 7400 0000 7c0f 5201 5a03 0300  @...t...|.R.Z...
00000f0: 1805 9454 b536 de74 31fe 5b02 008d ed00  ...T.6.t1.[.....
0000100: 0004 02e6 85f8 ff00 2c06 82ee e2ff 01ff  ........,.......
0000110: 8580 8000 9c00 0000 0000 03e6 8800 ff00  ................

Answer 1

事实证明，流运营商＆gt;＆gt;即使使用open / constructor中指定的二进制标志，ifstream也不应该用于读取二进制文件。这导致算法跳过字节，因此产生的偏移量太小。

修复是替换：

infile >> b;

使用：

infile.read(&b, 1);

在C ++中搜索二进制文件分隔符时获取不正确的偏移量

1 个答案: