Question

我搜索了许多Stackoverflow正则表达式帖子但找不到我的答案。我使用以下内容查找给定$text字符串中的所有网址：

$pattern = "#((http|https|ftp|ftps)://)?([a-zA-Z0-9\-]*\.)+[a-zA-Z0-9]{2,4}(/[a-zA-Z0-9=.?&-]*)?#";

（同意可能会更精确/更有效/ ......但这不是问题......）。

现在输入这个文字：

$text = "Website: www.example.com, ";
$text .= "Contact us: http://www.example.com/cu?t=contactus#anchor, ";
$text .= "Email: contact@example.com";

然后一个

preg_match_all($pattern, $text, $matches);

会返回这些：

www.example.com
http://www.example.com/cu?t=contactus
example.com

最后example.com来自电子邮件，我希望能够排除它我尝试了[^@]，(?!@)的许多组合......但无济于事，我仍然收到了电子邮件结果。

我能做的最好的事情是在开头添加一个可选的@，这样它就会返回@example.com，然后循环我的结果以排除以@开头的结果。

有没有更好的解决方案？单个模式不包含电子邮件的子字符串？

Answer 1

不使用过高级功能（例如断言）的示例解决方案：

<?php

$text = 'ftp://web.com, ';
$text .= "Website: www.example.com, ";
$text .= "Contact us: http://www.example.com/cu?t=contactus#anchor, ";
$text .= "Email: contact@example.com";

$base = "((http|https|ftp|ftps)://)?([a-zA-Z0-9\-]*\.)+[a-zA-Z0-9]{2,4}(/[a-zA-Z0-9=.?&-]*)?";

$matches = array(); preg_match_all("#$base#", $text, $matches); var_dump($matches[0]);
$matches = array(); preg_match_all("#\s($base)#", " $text", $matches); var_dump($matches[1]);

?>

输出：

array(4) {
  [0]=>
  string(13) "ftp://web.com"
  [1]=>
  string(15) "www.example.com"
  [2]=>
  string(37) "http://www.example.com/cu?t=contactus"
  [3]=>
  string(11) "example.com"
}
array(3) {
  [0]=>
  string(13) "ftp://web.com"
  [1]=>
  string(15) "www.example.com"
  [2]=>
  string(37) "http://www.example.com/cu?t=contactus"
}

只需检查URL之前的空格，但不要在子模式中包含它。使用[^@]将无效，因为正则表达式只会将e与[^@]和xample.com匹配为匹配的其余部分 - 它们稍后会合并为一个匹配项。< / p>

preg_match_all查找所有网址但排除电子邮件

1 个答案: