LibXML内部和输出编码

时间:2010-07-02 10:15:08

标签: c xml character-encoding libxml2

我正在尝试使用ISO-8859-1中的libxml2编写XML文件。 但是从文档中看来,对于我创建的每个文本节点,我都必须转换为UTF-8,这是libxml的内部编码。然后,当调用xmlSaveFormatFileEnc()时,libxml将转换为目标编码并将encoding属性添加到文档中。

这个假设是否正确? 现在我的代码大致如下:

xmlNode *root_element = NULL, *node4 = NULL; xmlDoc *doc = NULL;

doc = xmlNewDoc(BAD_CAST XML_DEFAULT_VERSION);
root_element = xmlNewDocNode(doc, NULL, BAD_CAST("root"),
                    NULL);
char * input_str = getLatin1Data();
isolat1ToUTF8(utf8_str, &file_size, input_str, &inlen);

node4 = xmlNewCDataBlock(doc, BAD_CAST list_content, xmlStrlen(BAD_CAST utf8_str));

xmlAddChild(root_element, node4);
xmlSaveFormatFileEnc("test_file.xml", doc, "UTF-8", 1);
xmlFreeDoc(doc);

doc = xmlNewDoc(BAD_CAST XML_DEFAULT_VERSION); root_element = xmlNewDocNode(doc, NULL, BAD_CAST("root"), NULL); char * input_str = getLatin1Data(); isolat1ToUTF8(utf8_str, &file_size, input_str, &inlen); node4 = xmlNewCDataBlock(doc, BAD_CAST list_content, xmlStrlen(BAD_CAST utf8_str)); xmlAddChild(root_element, node4); xmlSaveFormatFileEnc("test_file.xml", doc, "UTF-8", 1); xmlFreeDoc(doc);

1 个答案:

答案 0 :(得分:1)

你的假设是对的。如果需要xmlChar,例如xmlNewCDataBlockxmlNewText,则始终为UTF-8:

来自include/libxml/xmlstring.h(libxml 2.8.0):

/**
 * xmlChar:
 *
 * This is a basic byte in an UTF-8 encoded string.
 * It's unsigned allowing to pinpoint case where char * are assigned
 * to xmlChar * (possibly making serialization back impossible).
 */