我正在使用libxml2来解析HTML。 HTML可能如下所示:
<div>
Some very very long text here.
</div>
我想插入一个子节点,例如在文本之前的标题,如下所示:
<div>
<h3>
Some header here
</h3>
Some very very long text here.
</div>
不幸的是,libxml2总是在文本后添加我的标题,如下所示:
<div>
Some very very long text here.
<h3>
Some header here
</h3>
</div>
我该如何解决这个问题?
答案 0 :(得分:2)
文本内容是子节点,因此您可以获取指向文本节点的指针,并使用xmlAddPrevSibling函数添加元素。这是一个示例,但没有错误处理或正确清理。
xmlInitParser();
// Create an XML document
std::string content( "<html><head/><body><div>Some long text here</div></body></html>" );
xmlDocPtr doc = xmlReadMemory( content.c_str(), content.size(), "noname.xml", 0, 0 );
// Query the XML document with XPATH, we could use the XPATH text() function
// to get the text node directly but for the sake of the example we'll get the
// parent 'div' node and iterate its child nodes instead.
std::string xpathExpr( "/html/body/div" );
xmlXPathContextPtr xpathCtx = xmlXPathNewContext( doc );
xmlXPathObjectPtr xpathObj = xmlXPathEvalExpression( BAD_CAST xpathExpr.c_str(), xpathCtx );
// Get the div node
xmlNodeSetPtr nodes = xpathObj->nodesetval;
xmlNodePtr divNode = nodes->nodeTab[ 0 ];
// Iterate the div child nodes, though in this example we know
// there'll only be one node, the text node.
xmlNodePtr divChildNode = divNode->xmlChildrenNode;
while( divChildNode != 0 )
{
if( xmlNodeIsText( divChildNode ) )
{
// Create a new element with text node
xmlNodePtr headingNode = xmlNewNode( 0, BAD_CAST "h3" );
xmlNodePtr headingChildNode = xmlNewText( BAD_CAST "Some heading here" );
xmlAddChild( headingNode, headingChildNode );
// Add the new element to the existing tree before the text content
xmlAddPrevSibling( divChildNode, headingNode );
break;
}
divChildNode = divChildNode->next;
}
// Display the result
xmlDocDump( stdout, doc );
xmlCleanupParser();