如何为每个匹配使用xpath仅选择文本字符串的直接父节点

时间:2016-08-19 16:33:08

标签: php xpath domdocument

注意:这与以下问题的不同之处在于,我们在节点内和同一节点的子节点内出现了值:

XPath contains(text(),'some string') doesn't work when used with node with more than one Text subnode

给出以下html:

$content = 
'<html>
 <body>
  <div>
   <p>During the interim there shall be nourishment supplied</p>
  </div>
  <div>
   <p>During the <a href="#">interim</a> there shall be interim nourishment supplied</p>
  </div>
  <div>
   <ul><li>During the interim there shall be nourishment supplied</li></ul>
  </div>
 </body>
</html>';

以下xpath:

//*[contains(text(),'interim')]

...只提供3场比赛,而我想要四场比赛。根据评论,我期待的四个要素是P P A LI。

1 个答案:

答案 0 :(得分:0)

这完全符合预期。请参阅this glot.io链接。

<?php

$html = <<<HTML
<html>
 <body>
  <div>
   <p>During the interim there shall be nourishment supplied</p>
  </div>
  <div>
   <p>During the <a href="#">interim</a> there shall be interim nourishment supplied</p>
  </div>
  <div>
   <ul><li>During the interim there shall be nourishment supplied</li></ul>
  </div>
 </body>
</html>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

foreach($xpath->query('//*/text()[contains(.,"interim")]') as $n) var_dump($n->getNodePath());

您将获得四场比赛:

  • / HTML /体/格[1] / P /文本()
  • / HTML /体/格[2] / P / A /文本()
  • / HTML /体/格[2] / P /文本()[2]
  • / HTML /体/格[3] / UL / LI /文本()