<li class="page_item page-item-6"><a href="http://localhost/wordpress1006/?page_id=6">About us</a></li>
<li class="page_item page-item-12"><a href="http://localhost/wordpress1006/?page_id=12">Contact</a></li>
<li class="page_item page-item-10"><a href="http://localhost/wordpress1006/?page_id=10">Portfolio</a></li>
<li class="page_item page-item-8"><a href="http://localhost/wordpress1006/?page_id=8">Services</a></li>
我只需要从这个列表中提取链接,我会使用正则表达式,但我太害怕了。 因此,数字将在page_item page-item-number
的末尾发生变化你建议我在这做什么?
提前致谢
答案 0 :(得分:2)
试试这个:
$matches = array();
$string = '
<li class="page_item page-item-6"><a href="http://localhost/wordpress1006/?page_id=6">About us</a></li>
<li class="page_item page-item-12"><a href="http://localhost/wordpress1006/?page_id=12">Contact</a></li>
<li class="page_item page-item-10"><a href="http://localhost/wordpress1006/?page_id=10">Portfolio</a></li>
<li class="page_item page-item-8"><a href="http://localhost/wordpress1006/?page_id=8">Services</a></li>
';
preg_match_all('/href="(.*?)"/i', $string, $matches);
var_dump($matches[1]);
输出:
array
0 => string 'http://localhost/wordpress1006/?page_id=6' (length=41)
1 => string 'http://localhost/wordpress1006/?page_id=12' (length=42)
2 => string 'http://localhost/wordpress1006/?page_id=10' (length=42)
3 => string 'http://localhost/wordpress1006/?page_id=8' (length=41)
(注意它会在更复杂的HTML上失败,在这种情况下我不再使用Regex,而是像Simple HTML DOM这样的东西)
答案 1 :(得分:2)
我觉得这很有趣。因此,这是一个如何从HTML中获取URL而不使用正则表达式的解决方案。
$html = '
<li class="page_item page-item-6"><a href="http://localhost/wordpress1006/?page_id=6">About us</a></li>
<li class="page_item page-item-12"><a href="http://localhost/wordpress1006/?page_id=12">Contact</a></li>
<li class="page_item page-item-10"><a href="http://localhost/wordpress1006/?page_id=10">Portfolio</a></li>
<li class="page_item page-item-8"><a href="http://localhost/wordpress1006/?page_id=8">Services</a></li>
';
$tidy = new tidy();
$tidy->parseString($html);
$dom = new DOMDocument();
$dom->loadHTML($tidy->html());
$links = $dom->getElementsByTagName('a');
$matches = array();
foreach ($links as $link) {
$matches[] = $link->attributes->getNamedItem('href')->value;
}
var_dump($matches);
array(4) {
[0]=>
string(41) "http://localhost/wordpress1006/?page_id=6"
[1]=>
string(42) "http://localhost/wordpress1006/?page_id=12"
[2]=>
string(42) "http://localhost/wordpress1006/?page_id=10"
[3]=>
string(41) "http://localhost/wordpress1006/?page_id=8"
}