Question

我有一个包含HTML文档的字符串，我想从中提取所有URL。我试过这个：

preg_match_all('/(http:\/\/){1}.{1,}\..{1,}/', $html_document /* a valid document, containing a lot of links*/, $matches);
print_r($matches);

但是我没有包含所有链接的数组，而是获得HTML代码的一部分我的代码出了什么问题？

Answer 1

如果您对提取网址感兴趣而不是验证，请尝试以下正则表达式：

\bhttps?:\/\/[^\s]*

示例代码：

$re = "/\\bhttps?:\\/\\/[^\\s]*/im";
$str = "http://www.regex101.com https://www.stachoverflow.com";

preg_match_all($re, $str, $matches);