更新1

Question

创建一个xml文件：

<chapter id="1">
  <text line="1"> <p>HTML content 1</p> </text>
  <text line="2"> <q>HTML<q> content 2 </text>
  <text line="3"> HTML <b>content 3<b> </text>
</chapter>

使用DOMDocument，我可以使用哪些查询来获取与<chapter id="1">...</chapter>相关联的所有内容并包含HTML标记？输出如下：

<p>HTML content 1</p>
<q>HTML<q> content 2
HTML <b>content 3<b>

PS： 从注释开始，我认为哪个问题会有所不同。只是我问是否有可能以及如何处理节点内部忽略html-tag的内容（如果存在的话）可以修改原始xml。

Answer 1

您的xml字符串无效，您必须首先将content节点中的text转换为htmlEntities，例如：

$textContent = htmlentities($text);

之后，我们有：

$xmlText = '<chapter id="1">
  <text line="1"> &lt;p&gt;HTML content 1&lt;/p&gt; </text>
  <text line="2"> &lt;q&gt;HTML&lt;q&gt; content 2 </text>
  <text line="3"> HTML &lt;b&gt;content 3&lt;b&gt; </text>
</chapter>';

现在我们只需要使用SimpleXMLElement来解析：

$xmlObject = new SimpleXMLElement($xmlText);
$items = $xmlObject->xpath("text");
foreach ($items as $item){
    echo html_entity_decode($item);
}

更新1

如果您无法更改XML字符串，则需要使用正则表达式而不是 htmlDom ：

function get_tag_contents( $tag, $xml ) {
    preg_match_all( "#<$tag .*?>(.*?)</$tag>#", $xml, $matches );

    return $matches[1];
}

$invalidXml = '<chapter id="1">
  <text line="1"> <p>HTML content 1</p> </text>
  <text line="2"> <q>HTML<q> content 2 </text>
  <text line="3"> HTML <b>content 3<b> </text>
</chapter>';

$textContents = get_tag_contents( 'text', $invalidXml );

foreach ( $textContents as $content ) {
    echo $content;
}

如何从包含html标记的xml文件中获取节点的内容，但作为内容的一部分

1 个答案:

更新1