Question

我有一个包含数千行的txt文件。每条线的长度各不相同。 txt文件主要包含以字节为单位的十六进制数据。例如：

01 01 04 03 = 4个字节。

第二行可能包含8个字节，第3个40字节，依此类推。有成千上万条这样的线。

现在我想将这些字节读入int缓冲区。我正在读取char缓冲区，在内存中它保存为0001 0001 0004 0003，我不想要它，它被认为是8字节。在内存中，它保存为3031 3031 3034 3030（ASCII），因为它是char缓冲区。我将其转换为0001 0001 0004 0003。

以下是我的代码

FILE *file;

char buffer[100] = { '\0' };
char line[100] = { '0' }; 

if(file!=NULL)
      {
        while(fgets(line, sizeof(line), file)!=NULL)
        {

          for(i = 0; (line[i] != '\r') ; i++)
          {
              buffer[i] = line[i];
          }
        }
       }

我想一行一行地读取整个文件。在内存中我想看到的只是01 01 04 03.我想使用int buffer会有所帮助。只要它将文件读入缓冲区行，就会将其存储为char。有什么建议吗？

Answer 1

我会在一行中读取，然后使用strtol转换输入中的各个数字。 strtol为您提供指向转换失败的字符的指针，您可以将其用作查找/转换下一个数字的起点。

Answer 2

您可以转换小的十六进制数字：

#include <ctype.h>
uint8_t digits2hex(char *digits) {
  uint8_t r = 0;
  while (isxdigit(*digits)) {
    r = r * 16 + (*digit - '0');
    digit++;
    /* check size? */
  }
  return r;
}

/ * ... * /

for(i = 0; (line[i] != '\r') ; i+=2)
{
          hexnumbers[hexcount++] = digits2hex(line + i);
          /* skip white space */
          while (isspace(line[i])) 
            i++
}

Answer 3

您似乎将字节的文本表示与字节的值混淆（或者，期望您的编译器执行更多操作。）

当程序读入“01”时，它以两个字节读取，其值对应于字符“0”和“1”的ASCII代码。 C对它们没有任何特殊之处，因此您需要将此序列转换为单字节值。请注意，C char是一个字节，因此保持此结果的大小正确。这是一个巧合，对于Unicode和其他宽字符编码，无论如何不为真。

有几种方法可以进行此转换。您可以像这样自己对字节进行算术运算：

unsigned char charToHex(char c) {
    if (isdigit(c)) return c - '0';
    return 9 + toupper(c) - 'A';
}

...

first = getc(fh);
second = getc(fh);
buffer[*end] = charToHex(first) << 4 | charToHex(second);

（请注意，我使用getc（）来读取字符而不是fgets（）。我稍后会介绍它。）

另请注意，'first'是输入的最重要的半字节。

您还可以（重新）从两个字节创建一个字符串并在其上调用strtol：

char buffer[3];
buffer[0] = first;
buffer[1] = second;
buffer[2] = 0;  // null-terminator

buffer[*end] = (char)strtol(buffer, NULL, 16);

与此相关，你可能有更好的运气使用getc（）一次读取一个字符的文件，忽略任何不是十六进制数字的东西。这样，如果输入行长于传递给fgets（）的缓冲区，则不会出现缓冲区溢出。它还可以更容易地容忍输入文件中的垃圾。

这是一个完整的例子。它使用isxdigit（）来检测十六进制字符并忽略其他任何内容，包括单个十六进制数字：

// Given a single hex digit, return its numeric value
unsigned char charToHex(char c) {
    if (isdigit(c)) return c - '0';
    return 9 + toupper(c) - 'A';
}

// Read in file 'fh' and for each pair of hex digits found, append
// the corresponding value to 'buffer'. '*end' is set to the index
// of the last byte written to 'buffer', which is assumed to have enough
// space.
void readBuffer(FILE *fh, unsigned char buffer[], size_t *end) {
    for (;;) {

        // Advance to the next hex digit in the stream.
        int first;
        do {
            first = getc(fh);
            if (first == EOF) return;
        } while (!isxdigit(first));

        int second;
        second = getc(fh);

        // Ignore any single hex digits
        if (!isxdigit(second)) continue;

        // Compute the hex value and append it to the array.
        buffer[*end] = charToHex(first) << 4 | charToHex(second);
        (*end)++;

    }
}

Answer 4

FILE *fp = ...;
int buffer[1024]; /*enough memery*/
int r_pos = 0;/*read start position*/
char line[128];
char tmp[4];
char *cp;
if(fp) {
  while(NULL!=fgets(line, sizeof(line), fp)) {
    cp = line;
    while(sscanf(cp, "%d %d %d %d", &tmp[0], &tmp[1], &tmp[2], &tmp[3])==4) {
      buffer[r_pos++] = *(int *)tmp; /*or ntohl(*(int *)tmp)*/
      cp += strlen("01 01 04 03 ");
    }
  }
}

从文件读取到C中的int缓冲区

4 个答案: