我想通过Xpath从这个html代码中获取文本1和文本2.
<div id="detailInfo" class="">
<h3 class=""><img src="/program/image/abc.gif" alt="ddd" width="92" height="23"></h3>
<p class=""><a href="http://link.html" target="_blank"><img alt="qvc_b.jpg" src="/image.jpg" width="300" height="50"></a></p>
<p class="">text 1<br>
text 2</p>
<p class=""><a href="http://link2.html">>text 3</a></p>
<p class=""> <span style="color:#00a7ac; font-size:12px"><br>
------------------------------------------------------------------<br>
text 4<br>
text 5
------------------------------------------------------------------</span>
<span><br>
------------------------------------------------------------------<br>
text 6
------------------------------------------------------------------</span></p>
<!-- /detailInfo -->
</div>
条件是从div的p个孩子直接获取所有文本内容,并且不从“a”和“span”标签获取文本
答案 0 :(得分:2)
在这种情况下,text()
可以使用normalize-space
,因此无法使用空格:
$dom = new DOMDocument();
$dom->loadHTML($html_string);
$xpath = new DOMXpath($dom);
$elements = $xpath->query("//div/p/text()[normalize-space()]");
foreach($elements as $e) {
echo $e->nodeValue . '<br/>';
}