读取文件中的第一行在第一行中给出了“\ 357 \ 273 \ 277”前缀

时间:2014-06-07 11:53:28

标签: c file-io byte-order-mark

当我使用函数readTheNRow with row = 0(我读第一行)时,我发现三个第一个字符是\ 357,\ 273和\ 277。我发现这个前缀是与UTF-8文件有关的一些,但有些文件有这个前缀,有些文件没有:(。我如何忽略我想从中读取的文件中所有类型的此类前缀?< / p>

int readTheNRow(char buff[], int row) {

int file = open("my_file.txt", O_RDONLY);
if (file < 0) {
    write(2, "closing fifo was unsuccessful\n", 31);
    exit(-1);
}

// function's variables
int i = 0;
char ch; // a temp variable to read with it
int check; // helping variable for checking the read function

// read till we reach the needed row
while (i != row) {

    // read one char
    check = read(file, &ch, 1);
    if (check < 0) {
        // write a error message to the user
        write(2, "error occurred in reading\n", 27);
        exit(-1);
    }

    if (check < 0) {
        // if means that we reached the end of file
        return -1; // couldn't read the N row (N is bigger than X)
    }
    printf("%c",ch);
    // check that the char is a \n
    if (ch == '\n') {
        i++;
    }
}

// read the number to the received buffer
i = 0;

do {
    // read one char
    check = read(file, buff + i, 1);
    if (check < 0) {
        // write a error message to the user
        write(2, "error occurred in reading\n", 27);
        exit(-1);
    }

    // if we reached the end of file
    if (check == 0) {
        break;
    }
    i++;

} while (buff[i - 1] != '\n');

// put the \0 in the end of the string
 buff[i - 1] = '\0';
return 1; // return that reading was successful

    // try to close the file
if (close(file) < 0) {
    write(2, "closing fifo was unsuccessful\n", 31);
    exit(-1);
}
}

1 个答案:

答案 0 :(得分:6)

您似乎正在尝试读取带有所谓BOM(字节订购标记)的文件。

测试这些前缀,如果它们周围使用了潜在信息,则继续读取文件,将其解释为BOM表示。

序列\357 \273 \277表示UTF-8正在跟随。这不需要考虑字节顺序,因为字节是这些文件的单位。

此处有关各种现有物料清单的更多信息:http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding