PHP DOMDocument - 匹配和删除URL

时间:2013-09-25 01:56:08

标签: dom

我正在尝试使用DOM从html页面中提取链接:

$html = file_get_contents('links.html');
$DOM = new DOMDocument();
$DOM->loadHTML($html);
$a = $DOM->getElementsByTagName('a');
foreach($a as $link){
    //echo out the href attribute of the <A> tag.
    echo $link->getAttribute('href').'<br/>';
}

输出:

http://dontwantthisdomain.com/dont-want-this-domain-name/
http://dontwantthisdomain2.com/also-dont-want-any-pages-from-this-domain/
http://dontwantthisdomain3.com/dont-want-any-pages-from-this-domain/
http://domain1.com/page-X-on-domain-com.html

http://dontwantthisdomain.com/dont-want-link-from-this-domain-name.html
http://dontwantthisdomain2.com/dont-want-any-pages-from-this-domain/
http://domain.com/page-XZ-on-domain-com.html

http://dontwantthisdomain.com/another-page-from-same-domain-that-i-dont-want-to-be-included/
http://dontwantthisdomain2.com/same-as-above/
http://domain3.com/page-XYZ-on-domain3-com.html

我想删除与dontwantthisdomain.com,dontwantthisdomain2.com和dontwantthisdomain3.com匹配的所有结果,以便输出看起来像这样:

http://domain1.com/page-X-on-domain-com.html
http://domain.com/page-XZ-on-domain-com.html
http://domain3.com/page-XYZ-on-domain3-com.html

有什么想法吗? :)

1 个答案:

答案 0 :(得分:0)

我认为你应该使用正则表达式。谷歌吧,玩得开心