Question

我想从段落

中找到网址和超文本

例如：

$content = "<a href="http://google.com" target="_blank">Google</a> The biggest
search engine is google .The lot of people are used google 
<a href="http://google.com" target="_blank">Google</a>The google video 
service is youtube. <a href="http://youtube.com/ncvh/">Youtube</a>.
Google also provide <a href="http:/gmail.com">Gmail</a>.";

输出

Text        Url                         Count   

Google      htp://google.com             2
Youtube     htp://youtube.com/ncvh/      1
Gmail       htp://gmail.com              1

请任何人帮助我

Answer 1

preg_match("/<a\shref\=\"(.*)\"/",$content,$matches);

$ matches是一个数组，其中包含正则表达式上的所有匹配项以查找链接。每个捕获组都是$ matches中的索引。

请注意，您的<a>代码未关闭。如果它们已关闭，您还可以从链接中提取文本：

preg_match("/<a\shref\=\"(.*)\">(.*)<\/a>/",$content,$matches);

我使用的正则表达式不防水。它依赖于双引号的使用，并期望href成为<a>标记中的最后一个属性。您可以在http://regexlib.com

等在线库中找到优化的正则表达式

Answer 2

我创建了很多html解析器。对我来说最好的方式：

preg_match_all('_<a(.*?)>(.*?)</a_i', $html, &$matches);获取一个attrs和锚文本

preg_match('_href[\s]*=[\s]*[\'"](.*?)[\'"]_', $attrs, &$href)获取href

将href解析为正确的URL：

$url = str_replace(array(" ", "\n", "\r", "\t"), '', $url);
$url_components = parse_url(trim($url));

使用php代码从字符串中查找url和链接文本

2 个答案: