我正在使用PHP Simple HTML DOM Parser和以下代码来捕获所有段落标签:
// Product Description
$html = file_get_html('http://domain.local/index.html');
$contents = strip_tags($html->find('div[class=product-details] p'));
我如何说要抓取X个段落,直到它碰到第一个ul
?
<p>
Paragraph 1
</p>
<p>
Paragraph 2
</p>
<p>
Paragraph 3
</p>
<ul>
<li>
List item 1
</li>
<li>
List item 2
</li>
</ul>
<blockquote>
Quote 1
</blockquote>
<blockquote>
Quote 2
</blockquote>
<blockquote>
Quote 3
</blockquote>
<p>
Paragraph 4
</p>
<p>
Paragraph 5
</p>
答案 0 :(得分:1)
您可以根据提到的要求使用以下代码:-
<?php
$html = file_get_html('http://domain.local/index.html');
$detailTags = $html->find('div[class=product-details] *');
$contents = "";
foreach ($detailTags as $detailTag){
// these condition will check if tag is not <p> or it's <ul> to break the loop.
if (strpos($detailTag, '<ul>') === 0 && strpos($detailTag, '<p>') !== 0) {
break;
}
$contents .= strip_tags($detailTag);
}
// contents will contain the output required.
echo $contents;
?>
输出:-
Paragraph 1 Paragraph 2 Paragraph 3
答案 1 :(得分:0)
编辑:Nandal的代码对您有用,因为它不会强迫您更改库。
如果您不想依赖第三方库,则可以使用PHP的DOM Document
功能,您需要为此功能启用扩展。
您可以查看以下代码,这些代码将打印段落,直到您点击其他任何标签为止:
<?php
$html = new DOMDocument();
$html->loadHTML("<html><body><p>Paragraph 1</p><p> Paragraph 2</p><p> Paragraph 3</p><ul> <li> List item 1 </li> <li> List item 2 </li> </ul><blockquote> Quote 1</blockquote><blockquote> Quote 2</blockquote><blockquote> Quote 3</blockquote><p> Paragraph 4</p><p> Paragraph 5</p></body></html>");
$xpath = new DOMXPath($html);
$nodes = $xpath->query('/html/body//*');
foreach($nodes as $node) {
if($node->nodeName != "p") {
break;
}
print $node -> nodeValue . "\n";
}