Question

我写了这个函数将所有特定的URL（mywebsite.com）转换为链接，并将其他URL剥离到@@@ spam @@@。

function get_global_convert_all_urls($content) {
  $content = strtolower($content);
  $replace = "/(?:http|https)?(?:\:\/\/)?(?:www.)?(([A-Za-z0-9-]+\.)*[A-Za-z0-9-]+\.[A-Za-z]+)(?:\/.*)?/im";
  preg_match_all($replace, $content, $search);
  $total = count($search[0]);
  for($i=0; $i < $total; $i++) {
  $url = $search[0][$i];
    if(preg_match('/mywebsite.com/i', $url)) {
      $content = str_replace($url, '<a href="'.$url.'">'.$url.'</a>', $content);            
    } else {
      $content = str_replace($url, '@@@spam@@@', $content); 
    }
  } 

  return $content;
}

我无法解决的唯一问题是，如果一行中有2个网址，则正则表达式不会以空格结尾。

$content = "http://www.mywebsite.com/index.html http://www.others.com/index.html";

结果：

<a href="http://www.mywebsite.com/index.html http://www.others.com/index.html">http://www.mywebsite.com/index.html http://www.others.com/index.html</a>

我如何得到以下结果：

<a href="http://www.mywebsite.com/index.html">http://www.mywebsite.com/index.html</a> @@@spam@@@

我试过在正则表达式的结尾添加这个（\ s | $）但没有运气：

/(?:http|https)?(?:\:\/\/)?(?:www.)?(([A-Za-z0-9-]+\.)*[A-Za-z0-9-]+\.[A-Za-z]+)(?:\/.*)?(\s|$)/im

Answer 1

根据您问题的变化进行编辑。

问题是你的。*在你的正则表达式的末尾，所以我的建议是用更精确的表达式替换它。我很快就把它煮熟了，你会想要一些测试来验证你的情况。 =）

$matches = null;
$returnValue = preg_match_all('!(?:http|https)?(?:\\:\\/\\/)?(?:www.)?(([A-Za-z0-9-]+\\.)*[A-Za-z0-9-]+\\.[A-Za-z]+)(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\\-\\._\\?\\,\\\'/\\\\\\+&%\\$#\\=~])*[^\\.\\,\\)\\(]!', 'mywebsite.com/index.html others.com/index.html', $matches);

结果：

array (
  0 => 
  array (
    0 => 'mywebsite.com/index.html ',
    1 => 'others.com/index.html',
  ),
  1 => 
  array (
    0 => 'mywebsite.com',
    1 => 'others.com',
  ),
  2 => 
  array (
    0 => '',
    1 => '',
  ),
  3 => 
  array (
    0 => '',
    1 => '',
  ),
  4 => 
  array (
    0 => 'l',
    1 => 'm',
  ),
)

Answer 2

将正则表达式@echo. @echo This will remove all files, directories and registry keys about VISUAL STUDIO 2015 @echo. @pause rd "C:\Program Files (x86)\Microsoft Visual Studio 14.0" /S rd "C:\Program Files\Microsoft Visual Studio 14.0" /S rd "C:%homepath%\Documents\Visual Studio 2015" /S rd "C:%homepath%\AppData\Roaming\Microsoft\VisualStudio\14.0" /S rd "C:%homepath%\AppData\Local\Microsoft\VisualStudio\14.0" /S rd "C:%homepath%\AppData\Local\Microsoft\VSCommon\14.0" /S @echo. @echo Removing Registry Keys @pause REG DELETE HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\VisualStudio\14.0 REG DELETE HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\VisualStudio\14.0 REG DELETE HKEY_CURRENT_USER\Software\Microsoft\VisualStudio\14.0 REG DELETE HKEY_CURRENT_USER\Software\Microsoft\VisualStudio\14.0_Config @echo. @echo. FINISHED! @pause的最后一个元素更改为(?:\/.*)?。

你的正则表达式匹配每个字符，直到字符串结尾包括空格，\S*匹配每个不是空格的字符。

您还可以将整个正则表达式简化为：

\S*

Answer 3

更改正则表达式模式以捕获最后一个网址部分（/index.html，/index.php）。

/(?:http|https)?(?:\:\/\/)?(?:www.)?(([A-Za-z0-9-]+?\.)?[A-Za-z0-9-]+?\.?[A-Za-z]*?(\/\w+?\.\w+?)?)\b/im

更改您的功能内容，如下所示：

$content = "http://www.mywebsite.com/index.html http://www.others.com/index.html";

function get_global_convert_all_urls($content) {
  $content = strtolower($content);
  $replace = "/(?:http|https)?(?:\:\/\/)?(?:www.)?(([A-Za-z0-9-]+?\.)?[A-Za-z0-9-]+?\.?[A-Za-z]*?(\/\w+?\.\w+?)?)\b/im";
  preg_match_all($replace, $content, $search);

  foreach ($search[0] as $url) {
    if(preg_match('/mywebsite.com/i', $url)) {
      $content = str_replace($url, '<a href="'.$url.'">'.$url.'</a>', $content);         
    } else {
      $content = str_replace($url, '@@@spam@@@', $content); 
    }
  } 

  return $content;
}

var_dump(get_global_convert_all_urls($content));

输出：

string '<a href="http://www.mywebsite.com/index.html">http://www.mywebsite.com/index.html</a> @@@spam@@@'

PHP正则表达式匹配特定的URL并剥离其他人

3 个答案: