Question

有时会发现HTML或HTML损坏，结果可能会出乎意料。说我找到像这样的HTML

<p>
   Paragraph 1 
   <p>Child p with text</p> text of parent p
</p>
<p>Paragraph 2</p>
<a href='#'>Link</a>

我过滤段落，我希望将其作为结果

Array(
   [0] = '<p>Paragraph 1 <p>Child p with text</p> text of parent p</p>'
   [1] = '<p>Paragraph2</p>'
);

这是我的代码：

$results = $dom->getElementsByTagName('p');
$tags = array();
foreach($results as $tag){
   $tags[] = $dom->saveXML($tag);
}

return $tags;

我得到了这个

 Array(
   [0] = '<p>Paragraph 1</p>'
   [1] = '<p>Child p with text</p>'
   [2] = '<p>Paragraph2</p>'
);

请注意如何删除父p 的文字。

Answer 1

您实际上可以在xml模式下获得所需内容：

$dom = new DOMDocument;
$str = <<<EOF
<p>
   Paragraph 1 
   <p>Child p with text</p> text of parent p
</p>
<p>Paragraph 2</p>
<a href='#'>Link</a>
EOF;

$dom->loadXML('<xml>' . $str . '</xml>');
$xpath = new DOMXPath($dom);
$tags = array();

foreach($xpath->query('p') as $tag){
  $tags[] = $dom->saveXML($tag);
}

html模式不起作用的原因是html中不允许嵌套p，但xml中允许使用嵌套{{1}}。

php - DOM按标记名称获取元素，包括子节点

1 个答案: