libXML2无法正确读取他自己的XML UTF-8格式

时间:2018-07-18 13:13:00

标签: utf-8 libxml2

我想用libXML2解析UTF8格式的XML。 我的代码是用C语言编写的,我使用的是libXML2 v2.9.3。

我的代码如下:

    xmlTextReaderPtr reader;
    xmlTextWriterPtr writer;
    writer = xmlNewTextWriterFilename("test.xml", 0);
    xmlTextWriterStartDocument(writer, NULL, "UTF-8", NULL);
    xmlTextWriterStartElement(writer, BAD_CAST "node_with_é_character");

    xmlTextWriterEndElement(writer);
    xmlTextWriterEndDocument(writer);
    xmlFreeTextWriter(writer);
    reader = xmlReaderForFile("test.xml", "UTF-8", XML_PARSE_RECOVER);

    int ret = 1;
     while (ret == 1) {
         const xmlChar *nameT = xmlTextReaderConstName(reader);

         printf("\n   ---> %s\n",nameT);
         ret = xmlTextReaderRead(reader);
    }

输出为:

   ---> (null)

   ---> node_with_é_character

问题是“ node_with_ é _character”跟踪,而不是“ node_with_ é _character”

我的命令提示符设置为 “ chcp 1252”

我不明白为什么liXML2无法存储/读取“é”字符。

1 个答案:

答案 0 :(得分:1)

正如在Windows下的注释中所指出的那样,所以我想您的源代码可能不是UTF-8编码的,因此C字符串“node_with_é_character”在可执行文件中不是UTF-8编码的。

我不知道libxml2接口,但是代码示例非常清楚,它期望输入参数为UTF-8。参见http://xmlsoft.org/examples/testWriter.c

/* Write a comment as child of EXAMPLE.
 * Please observe, that the input to the xmlTextWriter functions
 * HAS to be in UTF-8, even if the output XML is encoded
 * in iso-8859-1 */
tmp = ConvertInput("This is a comment with special chars: <\xE4\xF6\xFC>",
                   MY_ENCODING);

将您的源文件另存为UTF-8,将帮助您解决问题。