Question

Html文档就像这样

<li><h2><a href="http://beezfeed.cu.ma">Beezfeed</h2></a></li>
<li><a href="http://beezfeed.cu.ma/kuto">Beezfeed kuto</a></li>
<li><a href="http://beezfeed.cu.ma/movies">Beezfeed movies</a></li>

这里我想要最后两个链接href。这是我的代码我使用简单的html dom，所以请回答我这个问题，请你在regex中告诉我。

$bb->load($str);
$link = $bb->find('div[class=azindex] li');

foreach ($link as $s) {
    $lin = $s->find("a");
    foreach ($lin as $li) {
        echo $li->href . "<br/>";
    }
}

我得到包含li标签的所有链接，但我不想要有h2标签的链接。提前致谢

Answer 1

如果我必须以一种简单的方式做到这一点，我会这样做：

$bb->load($str);
$link=$bb->find('div[class=azindex] li');
foreach($link as $s){
$lin=$s->find("a");
foreach($lin as $li){
    if(is_null($li->find("h2")) {
        echo $li->href."<br>";
    }
    /*Do nothing if h2 was found*/
}
}

我刚刚在$ li上使用了find方法，如果发现h2我什么都不做，否则我打印该行。我无法测试它，我希望它有所帮助。

Answer 2

匹配正确链接的正则表达式：

$items = '
<li><h2><a href="http://beezfeed1.cu.ma">Beezfeed1</h2></a></li>
<li><p><a href="http://beezfeed2.cu.ma/">Beezfeed2</a></p></li>
<li><h4><a href="http://beezfeed3.cu.ma">Beezfeed3</h4></a></li>
<li><a href="http://beezfeed4.cu.ma/">Beezfeed4</a></li>
';

preg_match_all('(<li>(?!<h[1-9]>).*<a href="(.*)")',$items,$matches);

匹配： http // beezfeed2.cu.ma / 和 http // beezfeed4.cu.ma /

这将匹配所有h1到h9标签。

更严格的匹配：

preg_match_all('(<li>\s?<a href="(.*)")',$items,$matches);

这只会返回：

<强> HTTP // beezfeed4.cu.ma

此正则表达式不允许＆lt; li＆gt;之间的任何字符和＆lt; a＆gt;除了空格（\ s？是可选空格）。

如何使用简单的html dom

2 个答案: