Question

我正在寻找以二进制格式读取数据文件的最有效方法，然后在文件中搜索模式（标题）的出现。我已经使用cplusplus.com示例将文件读入内存：

#include <iostream>
#include <fstream>
using namespace std;

ifstream::pos_type size;
char * memblock;

int main () {
  ifstream file ("example.bin", ios::in|ios::binary|ios::ate);
  if (file.is_open())
  {
    size = file.tellg();
    memblock = new char [size];
    file.seekg (0, ios::beg);
    file.read (memblock, size);
    file.close();
  }
  else cout << "Unable to open file";
  return 0;
}

首先，我想知道这是否是为了我的目的这样做的最佳方式。如果是的话，我无法找到如何在memblock char数组中搜索0x54 0x51或它的二进制等效模式。

Answer 1

只需读取每个字符并将其与您搜索的第一个字符进行比较，如果匹配，检查下一个字节是否与下一个字节匹配，当您使用fstream读取二进制文件时，它会读取字节。

Answer 2

为您的目的提供高效的算法（根据理论，渐近运行时间和实际效率） http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm 和http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

就像它们在可读字符串上运行一样，它们可以处理字节序列。它们也会处理比特序列（但在这种情况下，它们通常不是最好的选项。你应该避免一点一点比较，也可以改变你的模式并进行比较。你的字母表也只包含0和1允许字符串搜索算法充分利用它们，但是关于你的问题（以及可能的十六进制表示），我认为这不是你想要的。

但是，如果您正在从磁盘读取文件，并且模式不会太长，则程序的执行时间将主要取决于从磁盘读取所需的时间。在这种情况下，由Gam Erix发布的天真解决方案非常精细且更容易实现。

小于机器词的模式的另一个优化：只是将模式解释为更大的类型（例如uint64_t）并对整个模式使用单个比较（当你到达你的结尾时你必须检查边界输入序列）

Answer 3

我怀疑这是最有效甚至最快的方式，我没有声称拥有超级技术，但这是扫描位模式的方法。

//...
file.close();
//...
unsigned int pattern = 0x5451;
unsigned int mask
    = static_cast<unsigned int>(pow(2, 16) - 1) //generate 16-bit mask
;
unsigned int read_buff = static_cast<unsigned char>(memblock[1]);
    read_buff << 8;
    read_buff |= static_cast<unsigned char>(memblock[0]);

//start at index 2 since we already read 2 bytes.
for (ifstream::pos_type i = 2; i < fsize; i += 1) {

    for (char shift_count = 0; shift_count < 8; ++shift_count) {

        //put the third byte into read_buff
        if (shift_count == 0) {
            unsigned int read_byte = static_cast<unsigned char>(memblock[i]);
            read_byte <<= 16;
            read_buff |= read_byte;
        }

        unsigned int work_area = read_buff;
        work_area &= mask;

        if (work_area == pattern) {
            //happy dance
        }

        read_buff >>= 1;
    }
}

如何在c ++中搜索文件中的十六进制或位模式？

3 个答案: