我遇到了这个优秀的“小”RegEx
来替换纯文本中的URL(而不是超链接)。
唯一的问题是我对RegEx知之甚少,所以我完全坚持让我的博客正常工作。
所以,我要求帮助排除网址,例如$exception_url = 'http://mysite.com'
function strip_urls($text, $xception_url = FALSE)
{
return preg_replace("/( (?:
(?:https?|ftp) : \\/*
(?:
(?: (?: [a-zA-Z0-9-]{2,} \\. )+
(?: arpa | com | org | net | edu | gov | mil | int | [a-z]{2}
| aero | biz | coop | info | museum | name | pro
| example | invalid | localhost | test | local | onion | swift ) )
| (?: [0-9]{1,3} \\. [0-9]{1,3} \\. [0-9]{1,3} \\. [0-9]{1,3} )
| (?: [0-9A-Fa-f:]+ : [0-9A-Fa-f]{1,4} )
)
(?: : [0-9]+ )?
(?! [a-zA-Z0-9.:-] )
(?:
\\/
[^&?#\\(\\)\\[\\]\\{\\}<>\\'\\\"\\x00-\\x20\\x7F-\\xFF]*
)?
(?:
[?#]
[^\\(\\)\\[\\]\\{\\}<>\\'\\\"\\x00-\\x20\\x7F-\\xFF]+
)?
) | (?:
(?:
(?: (?: [a-zA-Z0-9-]{2,} \\. )+
(?: arpa | com | org | net | edu | gov | mil | int | [a-z]{2}
| aero | biz | coop | info | museum | name | pro
| example | invalid | localhost | test | local | onion | swift ) )
| (?: [0-9]{1,3} \\. [0-9]{1,3} \\. [0-9]{1,3} \\. [0-9]{1,3} )
)
(?: : [0-9]+ )?
(?! [a-zA-Z0-9.:-] )
(?:
\\/
[^&?#\\(\\)\\[\\]\\{\\}<>\\'\\\"\\x00-\\x20\\x7F-\\xFF]*
)?
(?:
[?#]
[^\\(\\)\\[\\]\\{\\}<>\\'\\\"\\x00-\\x20\\x7F-\\xFF]+
)?
) | (?:
[a-zA-Z0-9._-]{2,} @
(?:
(?: (?: [a-zA-Z0-9-]{2,} \\. )+
(?: arpa | com | org | net | edu | gov | mil | int | [a-z]{2}
| aero | biz | coop | info | museum | name | pro
| example | invalid | localhost | test | local | onion | swift ) )
| (?: [0-9]{1,3} \\. [0-9]{1,3} \\. [0-9]{1,3} \\. [0-9]{1,3} )
)
) )/Dx", '', $text);
}
非常感谢答案,谢谢。
答案 0 :(得分:2)
改变正则表达式几乎是不可能的,最终会变得很大。
然而,您可以暂时替换异常URL的部分,将其标识为带有一些伪造字符串的URL,然后在正则表达式之后将其替换回来(如果您真的想要偏执,则可以确保替换字符串在文本中不存在(或者在URL剥离后不存在),如果是,则附加一个随机数,直到它们不存在):
$identifier = '.com';
$temp_replace = '@@@STRIP_URLS-COM@@@';
$identifier2 = '://';
$temp_replace2 = '@@@STRIP_URLS-SLASHES@@@';
if ($exception_url) {
$text = str_replace($exception_url, str_replace(array($identifier, $identifier2), array($temp_replace, $temp_replace2), $exception_url), $text);
}
$text = preg_replace(...)
....rest of regex here...
if ($exception_url) {
$text = str_replace(array($temp_replace, $temp_replace2), array($identifier, $identifier2), $text);
}
return $text;
答案 1 :(得分:0)
我相信有人会觉得这很有用。
您可以指定相对网址,即允许来自您网站的网址:
strip_urls($blog_comment, 'http://www.mysite.com/');
来自一组合作伙伴域:
strip_url($blog_comment, array('http://mysite.com/', 'http://partner.com/', 'http://partner1.com/')).
使用Mihai Loga的使用占位符的想法,我修改了初始脚本以将数组或字符串作为$ exception_url。我还制作了占位符以使其更安全。
function strip_urls($text, $exception_url = array())
{
if( ! empty($exception_url))
{
if(is_string($exception_url)) $exception_url = array($exception_url);
$placeholder_array = array();
$placeholder = md5(uniqid());
if(strpos($text, $placeholder))
{
while(strpos($text, $placeholder))
{
$placeholder = md5(uniqid());
}
}
for($i = 0; $i < count($exception_url); $i++)
{
if( ! is_string($exception_url[$i]))
{
unset($exception_url[$i]);
$exception_url = array_values($exception_url);
continue;
}
$pos = mb_strpos($text, $exception_url[$i]);
if (FALSE === $pos) continue;
$text = substr_replace($text, $placeholder + $i, $pos, mb_strlen($exception_url[$i]));
$placeholder_array[] = $placeholder + $i;
}
}
$text = preg_replace("/( (?:
(?:https?|ftp) : \\/*
(?:
(?: (?: [a-zA-Z0-9-]{2,} \\. )+
(?: arpa | com | org | net | edu | gov | mil | int | [a-z]{2}
| aero | biz | coop | info | museum | name | pro
| example | invalid | localhost | test | local | onion | swift ) )
| (?: [0-9]{1,3} \\. [0-9]{1,3} \\. [0-9]{1,3} \\. [0-9]{1,3} )
| (?: [0-9A-Fa-f:]+ : [0-9A-Fa-f]{1,4} )
)
(?: : [0-9]+ )?
(?! [a-zA-Z0-9.:-] )
(?:
\\/
[^&?#\\(\\)\\[\\]\\{\\}<>\\'\\\"\\x00-\\x20\\x7F-\\xFF]*
)?
(?:
[?#]
[^\\(\\)\\[\\]\\{\\}<>\\'\\\"\\x00-\\x20\\x7F-\\xFF]+
)?
) | (?:
(?:
(?: (?: [a-zA-Z0-9-]{2,} \\. )+
(?: arpa | com | org | net | edu | gov | mil | int | [a-z]{2}
| aero | biz | coop | info | museum | name | pro
| example | invalid | localhost | test | local | onion | swift ) )
| (?: [0-9]{1,3} \\. [0-9]{1,3} \\. [0-9]{1,3} \\. [0-9]{1,3} )
)
(?: : [0-9]+ )?
(?! [a-zA-Z0-9.:-] )
(?:
\\/
[^&?#\\(\\)\\[\\]\\{\\}<>\\'\\\"\\x00-\\x20\\x7F-\\xFF]*
)?
(?:
[?#]
[^\\(\\)\\[\\]\\{\\}<>\\'\\\"\\x00-\\x20\\x7F-\\xFF]+
)?
) | (?:
[a-zA-Z0-9._-]{2,} @
(?:
(?: (?: [a-zA-Z0-9-]{2,} \\. )+
(?: arpa | com | org | net | edu | gov | mil | int | [a-z]{2}
| aero | biz | coop | info | museum | name | pro
| example | invalid | localhost | test | local | onion | swift ) )
| (?: [0-9]{1,3} \\. [0-9]{1,3} \\. [0-9]{1,3} \\. [0-9]{1,3} )
)
) )/Dx", '', $text);
return (empty($exception_url))? $text : str_replace($placeholder_array, $exception_url, $text);
}
归功于Mihai Loga并设计了这个RegEx ......一切都以一个好主意开始。