Question

以下是html页面的来源：

<h3>Background</h3>
<p>Example 1<br>Example 2<br> </br> <ul></li>ABC<li></ul>
</p>
<h3>Job Description</h3>
<p>content of job description</p>

这是xpath查询：

//node()[preceding::h3[text()="Background"] and following-sibling::h3[text()="Job Description"]]

我需要这个输出：

<p>Example 1<br>Example 2<br> </br> <ul></li>ABC<li></ul>
    </p>

Answer 1

简单地说，您需要执行以下操作：

$html = str_get_html($str);

foreach($html->find('h3') as $h3){
  if($h3->text() == 'Background'){
    echo $h3->next_sibling();
  }
}
// <p>Example 1<br>Example 2<br> </br> <ul></li>ABC<li></ul>  </p>

您无法使用Dom或Xpath到达那里，因为html太无效（ul内的p）

Answer 2

此行修复了代码。它现在保留了折线标记和<li>标记。

//node()[preceding::h3[text()="Background"] and following-sibling::h3[text()="Job Description"]]/node()'

我在字符串的末尾添加了/ node（）。

Xpath保留断行和其他html标记

2 个答案: