DOMDocument,获取跟在找到的元素后面的元素中的文本

时间:2015-03-31 22:37:46

标签: php domdocument

我想获得ul> li的文字,紧跟着文字ABC。本案例中的文字为123

<h2>CDE</h2>
<ul>...</ul>

<h2>ABC</h2>
<ul>
  <li>
    <span>123</span>
  </li>
</ul>

这就是我所拥有的,但它不起作用

$dom = new DOMDocument();
$dom->loadHTML($html); // $html is the code above

$h2_all = $dom->getElementsByTagName('h2');

foreach($h2_all as $h2) {
  $h2_text = $h2->textContent;

  if (trim(strtolower($h2_text)) == 'abc') {
    var_dump($h2->nextSibling);
  }
}

我认为这是因为$h2不包含我需要的ul数据,但我不确定如何获取它。

2 个答案:

答案 0 :(得分:1)

检查兄弟姐妹并找到第一个ul

$ul = null;
foreach($dom->getElementsByTagName('h2') as $h2) {
    if(trim(strtolower($h2->textContent)) == "abc") {       
        $obj = $h2->nextSibling;
        while($obj != null) {
            if($obj->nodeName == "ul") {
                $ul = $obj;
                break 2;
            }
            $obj = $obj->nextSibling;
        }
    }
}
//make sure ul has at least one li
if($ul != null && $ul->firstChild != null) {
        echo $ul->firstChild->nodeValue;
}

答案 1 :(得分:1)

您可以使用xpath查询:

$dom = new DOMDocument;
$dom->loadHTML($html);

$xp = new DOMXPath($dom);

$qry = '//ul[preceding::h2[1] = "ABC"]/li/span';

$result = $xp->query($qry)->item(0)->nodeValue;

查询详情:

//         # the path can start from anywhere in the dom tree
ul
[preceding::h2[1] = "ABC"] # condition: the first preceding h2 has the value "ABC"
/li/span   # lets continue the path until the span node