Question

使用PHP，即使它出现在HTML文档的不同层次级别，我如何提取所有<div class="this">？

<h3>Hello</h3>
<p>World</p>
<div class="this">
    (lots of random markup, including other divs)
</div>
<div class="this">
    (more random markup, including other divs)
</div>
<div class="inside">
    <div class="this">
        (even more random markup, including other divs)
    </div>
</div>
<p>Bye.</p>

如果用正则表达式无法实现，那么PHP是否有一个内置的库可以很容易地做到这样的事情（伪代码）？

$result = find_all($html, "div", "this");

期望的结果：

$result = array(
'<div class="this">
    (lots of random markup, including other divs)
</div>',
'<div class="this">
    (more random markup, including other divs)
</div>',
'<div class="this">
    (even more random markup, including other divs)
</div>',
);

Answer 1

您可以使用PHP Simple HTML DOM Parser进行工作，您的代码如下所示：

    <?php
include_once "simple_html_dom.php";

$html = str_get_html('<h3>Hello</h3><p>World</p><div class="this"> (lots of random markup, including other divs)</div><div class="this"> (more random markup, including other divs)</div><div class="inside"> <div class="this"> (even more random markup, including other divs) </div></div><p>Bye.</p>');

$divs = $html->find('div.this');
$ans=array();
foreach($divs as $div){
$ans[]=$div->outertext;
}

print_r($ans);


?>

Answer 2

PHP主要是一个HTML预处理器。好吧。所以要做你所要求的，你必须使用get_file_contents()或一些AJAX来获取文件，以便将数据发送到你的php。对于你提出的要求，后者似乎有点极端。

根据你想要实现的目标，我个人建议在用PHP处理之前将这些div保存在其他地方。像数据库一样说？然后，您可以根据数据库中的数据动态构建这些元素。

将JavaScript用于任何客户端操作，换句话说，在页面生成后的任何内容。比如说获取更多数据？

Answer 3

您需要使用DOMDocument方法loadHTMLFile或loadHTML来阅读您的文件在变量之后，您可以调用$instance->getElementsByTagName("div")，这将为您提供DOMNodeList。然后，您foreach并使用DOMNode过滤getAttribute("class")。

在HTML文档中查找具有特定类的所有div

3 个答案: