xpath:在break标记之前和之后选择文本节点

时间:2010-11-22 22:37:50

标签: html xpath

考虑以下因素:( <br><br/>的混合物)

text1
<br>
text2
<br/>
text3
<br/>
text4
<br>
text5

如何找到每个文本节点?

我正在考虑符合br标签之前OR的条件......但不确定<br><br/>在xpath中的处理方式是否不同。

2 个答案:

答案 0 :(得分:5)

DOMDocument loadHtml()方法适用于无效的HTML片段,因此您可以这样使用DOMXPath:

<?php

$html = 'text1
<br>
text2
<br/>
text3
<br/>
text4
<br>
text5';

echo "<pre>" . htmlentities($html) . "</pre><br>\n";

$dom = new DOMDocument();
// loadHtml() needs mb_convert_encoding() to work well with UTF-8 encoding
$dom->loadHtml(mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8"));

$xpath = new DOMXPath($dom);

echo "Text nodes preceding br:";
foreach($xpath->query('//text()[(following::br)]') as $node)
{
    var_dump($node->wholeText);
}

echo "Text nodes following br:";
foreach($xpath->query('//text()[(preceding::br)]') as $node)
{
    var_dump($node->wholeText);
}

echo "Text nodes following OR preceding br:";
foreach($xpath->query('//text()[(following::br) or (preceding::br)]') as $node)
{
    var_dump($node->wholeText);
}

答案 1 :(得分:0)

您的示例不是可以运行XPath查询的有效XML - 这些元素都不会被关闭。

但是,通常选择使用节点类型谓词,例如// br / text()