Question

我将这个字符串存储在php中：

Keyboard layout codes found here https://msdn.microsoft.com/en-us/library/cc233982.aspx test 123

test https://google.com

test google.com

<img src='http://example.com/pages/projects/uploader/files/2017-06-16%2011_27_36-Settings.png'>Link Converted to Image</img>

img元素是使用流行的正则表达式生成的;

$url = '~(https|http)?(://)((\S)+(png|jpg|gif|jpeg))~';
$output = preg_replace($url, "<img src='$0'>Link Converted to Image</img>", $output);

我的问题是，现在我想将常规链接转换为a元素。

我有这个正则表达式，除了一个问题外，它有效。

$url = '~(https|http)?(://)?((\S)+[.]+(\w*))~';
$output = preg_replace($url, "<img src='$0'>Link Converted to Image</img>", $output);

此正则表达式还会转换已成为img元素的链接，因此它会在a元素的源中放置img元素。我对避免这个问题的想法是忽略一个preg匹配，检查匹配是否以src='开头，但我无法弄清楚如何实际执行此操作。

我做错了吗？实现这一目标的最常见/最有效的方法是什么？

Answer 1

(*SKIP)(*FAIL)的一个很好的例子：

<img.+?</img>(*SKIP)(*FAIL) # match <img> tags and throw them away
|                           # or
\bhttps?\S+\b               # a link starting with http/https

<小时/> 在PHP：

<?php

$string = <<<DATA
Keyboard layout codes found here https://msdn.microsoft.com/en-us/library/cc233982.aspx test 123

test https://google.com

<img src='http://example.com/pages/projects/uploader/files/2017-06-16%2011_27_36-Settings.png'>Link Converted to Image</img>
DATA;

$regex = '~<img.+?</img>(*SKIP)(*FAIL)|\bhttps?\S+\b~';

$string = preg_replace($regex, "<a href='$0'>$0</a>", $string);
echo $string;
?>

Answer 2

添加@ Jan的答案，虽然这种解决方法可能存在一些缺点，但它会匹配类似URL的字符串：

<img.+?</img>(*SKIP)(*FAIL)|(?:https?\S+|(?:(?!:)(?(1)\S|(\w)))*\.\w{2,5})

Live demo

故障：

(?:             # Open a NCG (a)
    (?!:)       # Next immediate character shouldn't be a colon `:`
    (?(1)\S     # If CG #1 exists match a non-whitespace character
    |           # otherwise
    (\w))       # Match a word character (a URL begins with a word character)
)*              # As much as possbile (this cluster denotes a tempered pattern)
\.\w{2,5}       # Match TLD

缺点：

TLD的字符数限制
包含端口号的网址的部分匹配

正则表达式忽略前面有特定字符串的匹配

2 个答案: