提取所有标题标签(h1,h2,h3,...)及其内容。例如:
<h1 id="title">This is the title</h1>
<h2 id="subtitle">This is the subtitle</h2>
<p>And this is the paragraph</p>
将被提取为:
<h1 id="title">This is the title</h1>
和<h2 id="subtitle">This is the subtitle</h2>
我正在使用PHP并使用正则表达式作为标题说。
答案 0 :(得分:2)
建议使用正确的tool来完成任务。
$doc = DOMDocument::loadHTML('
<h1 id="title">This is the title</h1>
<h2 id="subtitle">This is the subtitle</h2>
<p>And this is the paragraph</p>
<p>another tag</p>
');
$xpath = new DOMXPath($doc);
$heads = $xpath->query('//h1|//h2|//h3|//h4|//h5|//h6');
foreach ($heads as $tag) {
echo $doc->saveHTML($tag), "\n";
}
输出
<h1 id="title">This is the title</h1>
<h2 id="subtitle">This is the subtitle</h2>