Xpath如何从除了某些标记之外的子项获取文本内容

时间:2014-09-18 08:23:14

标签: php html xpath domdocument

我想通过Xpath从这个html代码中获取文本1和文本2.

<div id="detailInfo" class="">
<h3 class=""><img src="/program/image/abc.gif" alt="ddd" width="92" height="23"></h3>

<p class=""><a href="http://link.html" target="_blank"><img alt="qvc_b.jpg" src="/image.jpg" width="300" height="50"></a></p>

<p class="">text 1<br>
text 2</p>

<p class=""><a href="http://link2.html">>text 3</a></p>

<p class=""> <span style="color:#00a7ac; font-size:12px"><br>
------------------------------------------------------------------<br>
text 4<br>
text 5
------------------------------------------------------------------</span>
<span><br>
------------------------------------------------------------------<br>
text 6
------------------------------------------------------------------</span></p>
<!-- /detailInfo -->
</div>

条件是从div的p个孩子直接获取所有文本内容,并且不从“a”和“span”标签获取文本

1 个答案:

答案 0 :(得分:2)

在这种情况下,text()可以使用normalize-space,因此无法使用空格:

$dom = new DOMDocument();
$dom->loadHTML($html_string);
$xpath = new DOMXpath($dom);

$elements = $xpath->query("//div/p/text()[normalize-space()]");
foreach($elements as $e) {
    echo $e->nodeValue . '<br/>';
}