Question

我想获得ul＆gt; li的文字，紧跟着文字ABC。本案例中的文字为123。

<h2>CDE</h2>
<ul>...</ul>

<h2>ABC</h2>
<ul>
  <li>
    <span>123</span>
  </li>
</ul>

这就是我所拥有的，但它不起作用

$dom = new DOMDocument();
$dom->loadHTML($html); // $html is the code above

$h2_all = $dom->getElementsByTagName('h2');

foreach($h2_all as $h2) {
  $h2_text = $h2->textContent;

  if (trim(strtolower($h2_text)) == 'abc') {
    var_dump($h2->nextSibling);
  }
}

我认为这是因为$h2不包含我需要的ul数据，但我不确定如何获取它。

Answer 1

检查兄弟姐妹并找到第一个ul：

$ul = null;
foreach($dom->getElementsByTagName('h2') as $h2) {
    if(trim(strtolower($h2->textContent)) == "abc") {       
        $obj = $h2->nextSibling;
        while($obj != null) {
            if($obj->nodeName == "ul") {
                $ul = $obj;
                break 2;
            }
            $obj = $obj->nextSibling;
        }
    }
}
//make sure ul has at least one li
if($ul != null && $ul->firstChild != null) {
        echo $ul->firstChild->nodeValue;
}

Answer 2

您可以使用xpath查询：

$dom = new DOMDocument;
$dom->loadHTML($html);

$xp = new DOMXPath($dom);

$qry = '//ul[preceding::h2[1] = "ABC"]/li/span';

$result = $xp->query($qry)->item(0)->nodeValue;

查询详情：

//         # the path can start from anywhere in the dom tree
ul
[preceding::h2[1] = "ABC"] # condition: the first preceding h2 has the value "ABC"
/li/span   # lets continue the path until the span node

DOMDocument，获取跟在找到的元素后面的元素中的文本

2 个答案: