在php中从列表中拉出链接

时间:2011-12-13 23:50:53

标签: php html string

<li class="page_item page-item-6"><a href="http://localhost/wordpress1006/?page_id=6">About us</a></li> 
<li class="page_item page-item-12"><a href="http://localhost/wordpress1006/?page_id=12">Contact</a></li> 
<li class="page_item page-item-10"><a href="http://localhost/wordpress1006/?page_id=10">Portfolio</a></li> 
<li class="page_item page-item-8"><a href="http://localhost/wordpress1006/?page_id=8">Services</a></li> 

我只需要从这个列表中提取链接,我会使用正则表达式,但我太害怕了。 因此,数字将在page_item page-item-number

的末尾发生变化

你建议我在这做什么?

提前致谢

2 个答案:

答案 0 :(得分:2)

试试这个:

$matches = array();
$string = '
    <li class="page_item page-item-6"><a href="http://localhost/wordpress1006/?page_id=6">About us</a></li> 
    <li class="page_item page-item-12"><a href="http://localhost/wordpress1006/?page_id=12">Contact</a></li> 
    <li class="page_item page-item-10"><a href="http://localhost/wordpress1006/?page_id=10">Portfolio</a></li> 
    <li class="page_item page-item-8"><a href="http://localhost/wordpress1006/?page_id=8">Services</a></li> 
';
preg_match_all('/href="(.*?)"/i', $string, $matches);
var_dump($matches[1]);

输出:

array
  0 => string 'http://localhost/wordpress1006/?page_id=6' (length=41)
  1 => string 'http://localhost/wordpress1006/?page_id=12' (length=42)
  2 => string 'http://localhost/wordpress1006/?page_id=10' (length=42)
  3 => string 'http://localhost/wordpress1006/?page_id=8' (length=41)

(注意它会在更复杂的HTML上失败,在这种情况下我不再使用Regex,而是像Simple HTML DOM这样的东西)

答案 1 :(得分:2)

我觉得这很有趣。因此,这是一个如何从HTML中获取URL而不使用正则表达式的解决方案。

$html = '
    <li class="page_item page-item-6"><a href="http://localhost/wordpress1006/?page_id=6">About us</a></li> 
    <li class="page_item page-item-12"><a href="http://localhost/wordpress1006/?page_id=12">Contact</a></li> 
    <li class="page_item page-item-10"><a href="http://localhost/wordpress1006/?page_id=10">Portfolio</a></li> 
    <li class="page_item page-item-8"><a href="http://localhost/wordpress1006/?page_id=8">Services</a></li>
';
$tidy = new tidy();
$tidy->parseString($html);
$dom = new DOMDocument();
$dom->loadHTML($tidy->html());
$links  = $dom->getElementsByTagName('a');

$matches = array();
foreach ($links as $link) {
    $matches[] = $link->attributes->getNamedItem('href')->value;
}

var_dump($matches);
array(4) {
  [0]=>
  string(41) "http://localhost/wordpress1006/?page_id=6"
  [1]=>
  string(42) "http://localhost/wordpress1006/?page_id=12"
  [2]=>
  string(42) "http://localhost/wordpress1006/?page_id=10"
  [3]=>
  string(41) "http://localhost/wordpress1006/?page_id=8"
}