使用PHP从html页面提取特定链接

时间:2012-04-05 10:31:00

标签: php regex curl preg-match domdocument

我有一个包含以下链接的HTML页面

<a class="out" href="www.a.com/hgfgtsdfdffsdfsdf">sdfsssdfddf</a>
<a href="www.a.com/hgfgt">dsfdsf</a>
<a class="menu" href="www.a.com/hgfgt">menu1</a>
<a class="menu" href="www.a.com/hgfgdfg">menu2</a>
<a class="menu" href="www.a.com/hgfgdfg">menu3</a>
<a href="www.a.com/hgfgtssdfdfsdf">sdfsdfddf</a>
<a href="www.a.com/hgfgtsdfsfsdfdf">sdfsdfsddf</a>
<a href="www.a.com/hgfgtsdfsdfsdf">sdfsdfddf</a>
<a class="out" href="www.a.com/hgfgtsdfsdfsdf">sdfsdfddf</a>

我想使用PHP将类“menu”的链接及其标题提取到数组中,请帮帮我。

3 个答案:

答案 0 :(得分:0)

preg_match_all('#<a class="menu" href="([^"]+)">([^<]+)</a>#', $content, $matches);

答案 1 :(得分:0)

$str = '<a class="out" href="www.a.com/hgfgtsdfdffsdfsdf">sdfsssdfddf</a>
<a href="www.a.com/hgfgt">dsfdsf</a>
<a class="menu" href="www.a.com/hgfgt">menu1</a>
<a class="menu" href="www.a.com/hgfgdfg">menu2</a>
<a class="menu" href="www.a.com/hgfgdfg">menu3</a>
<a href="www.a.com/hgfgtssdfdfsdf">sdfsdfddf</a>
<a href="www.a.com/hgfgtsdfsfsdfdf">sdfsdfsddf</a>
<a href="www.a.com/hgfgtsdfsdfsdf">sdfsdfddf</a>
<a class="out" href="www.a.com/hgfgtsdfsdfsdf">sdfsdfddf</a>';

preg_match_all('#<a class="menu" href="([^"]+)">([^<]+)#', $str, $m);

var_dump($m[1], $m[2]);

答案 2 :(得分:0)

以下是使用DOMDocument和XPath的方法:

$html = '

<a class="out" href="www.a.com/hgfgtsdfdffsdfsdf">sdfsssdfddf</a>
<a href="www.a.com/hgfgt">dsfdsf</a>
<a class="menu" href="www.a.com/hgfgt">menu1</a>
<a class="menu" href="www.a.com/hgfgdfg">menu2</a>
<a class="menu" href="www.a.com/hgfgdfg">menu3</a>
<a href="www.a.com/hgfgtssdfdfsdf">sdfsdfddf</a>
<a href="www.a.com/hgfgtsdfsfsdfdf">sdfsdfsddf</a>
<a href="www.a.com/hgfgtsdfsdfsdf">sdfsdfddf</a>
<a class="out" href="www.a.com/hgfgtsdfsdfsdf">sdfsdfddf</a>

';

$classname = 'menu'; // class to find

$doc = new DOMDocument();
$doc->loadHTML($html);

$xpath = new DOMXPath($doc);

$result = $xpath->query("//*[contains(@class, '$classname')]");

foreach($result as $elem)
{
    echo "title: " . $elem->nodeValue . "<br />";
    echo "link: " . $elem->getAttribute('href') . "<br />";
}