Question

我有一个我想读的二进制数据文件。文件中的值是8位无符号整数，带有＆＃34;记录＆＃34; ASCII文本的分隔符（例如$MSG，$GRP）。我将数据读作一大块，如下所示：

unsigned char *inBuff = (unsigned char*)malloc(file_size*sizeof(unsigned char));  
result = fread(inBuff, sizeof(unsigned char), file_size, pFile);

我需要搜索此数组以查找以$GRP开头的记录（因此我可以读取后面的数据），有人可以提出一个好方法吗？我尝试了几件事，但都没有奏效。例如，我最近的尝试是：

std::stringstream str1;
str1 << inBuff;
std::string strTxt = str1.str();

然而，当我检查它的长度时，它只有5.我在记事本中查看了该文件，并注意到第六个字符是NULL。所以它似乎因为NULL而切断了它。有什么想法吗？

Answer 1

假设 fread 没有返回-1，则其中的值将告诉您可以搜索多少字节。

期望能够对二进制数据进行字符串搜索是不合理的，因为我的二进制数据中有NUL字符会导致length函数提前终止。

一种可能的方法是搜索数据是使用缓冲区上的memcmp，搜索关键字和搜索关键字的长度。

Answer 2

（根据我的评论）

C str函数假定以零结尾的字符串。任何C字符串函数都将在第一个二进制文件0处停止。使用memchr找到$，然后使用strncmp或memcmp。特别是，不要假设4字节标识符之后的字节是二进制0。

在代码中（C，未测试）：

/* recordId should point to a simple string such as "$GRP" */
unsigned char *find_record (unsigned char *data, size_t max_length, char *recordId)
{
    unsigned char *ptr;
    size_t remaining_length;
    ptr = startOfData;

    if (strlen(recordId) > max_length)
        return NULL;

    remaining_length = max_length;
    do
    {
       /* fast scan for the first character only */
       ptr = memchr (ptr, recordId[0], remaining_length);
       if (!ptr)
          return NULL;

       /* first character matches, test entire string */
       if (!memcmp (ptr, recordId, strlen(recordId))
          return ptr;

       /* no match; test onwards from the next possible position */
       ptr++;

       /* take care not to overrun end of data */
       /* It's tempting to test
          remaining_length = ptr - startOfData;
          but there is a chance this will end up negative, and
          size_t does not like to be negative.
        */
       if (ptr >= startOfData+max_length)
           break;

       remaining_length = ptr-startOfData;
    } while (1);

    return NULL;
}

在unsigned char数组中搜索字符

2 个答案: