Question

可能重复：
How to parse and process HTML with PHP?

我得到了带有file_get_content的页面，我希望以任何方式提取页面中的所有链接吗？或者我可以使用str开始结束phares来获得目标字符串，如下所示：

$str=fdgdfbfbmnlmnjkl njnkhvnbn j<a href="http://www.google.com">google</a>
$link=str($str,"start","END")??????????
EX : $link=str($str,"http://www","com")=Res=>http://www.google.com or google?

或

$str=file_get_content("http://www.google.com");
    $link=str($str,"start","END")??????????
    EX : $link=str($str,"http://www","com")=Res=>http://www.google.com or google?

Answer 1

前段时间我遇到了同样的问题。这个解决方案对我来说非常有效。

 $string = "Hello World, <a href='http://www.google.com'>Google</a> ! Search also on <a href='http://www.bing.com'>Bing</a>";

 preg_match_all('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#', $string, $match);

 $matches = $match[0];

 foreach($matches as $var)
 {    
     print($var."<br>"); 
 }

Answer 2

您应该使用DOM方法从HTML中提取内容 - 使用正则表达式result in madness：

<?php
    $dom = new DOMDocument;
    $dom->loadHTMLFile('http://www.google.com/');

    $a = $dom->getElementsByTagName('a');
    foreach ($a as $e) {
        echo $e->getAttribute("href") . "\n";
    }
?>

找到两个元素之间的字符串

2 个答案: