Question

如何清除描述中不在标签内的所有网址。并保留所有img url？

例如，outcomming结果应该是这样的：

之前的描述：

this is my description www.url.com and other stuff. 
i have a picture <img src="www.url.com"> and other desc stuf..
sample text goes here and here..

之后的描述应该是：

this is my description and other stuff. 
i have a picture <img src="www.url.com"> and other desc stuf..
sample text goes here and here..

非常感谢。

Answer 1

$string = 'this is my description www.url.com and url.com and http://www.url.com other stuff. 
i have a picture <img src="www.url.com"> and other desc stuf..
sample text goes here and here..';

echo preg_replace('/[^\"](http(s?):\/\/)?(www)?\.?([A-Za-z0-9\-]){2,25}\.(com|net|org)[^\"]/', ' ', $string);

输出：

this is my description and and other stuff. 
i have a picture <img src="www.url.com"> and other desc stuf..
sample text goes here and here..

不确定这是否是您正在寻找的。

它显然与每个可能的URL不匹配，但它可以在某个地方开始。

Answer 2

$words = explode(' ', $description);
foreach ($words as $k => $v)
    if (filter_var($v, FILTER_VALIDATE_URL) || preg_match("/([a-z0-9\.]+)\.([a-z0-9][a-z0-9]+)/i", $v))
        unset($words[$k]);
$description = implode(' ', $words);

此解决方案删除格式正确的网址和域名，但这是一个近似的解决方案，因为我无法知道（imho）一个词是否像 whereis.it 这样的域或像 will.i.am

Answer 3

嗯，这非常困难，你可能最好尝试其他选择。一个URL有很多不同的形状和形式，并且为各种网址制作100％可靠的正则表达式非常困难。

首先，如果你需要匹配100％的网址，或者x％也足够好，并且假阳性正常，你必须做出选择。

然后你可以将所有单词与一个点匹配，通过parse_url运行它，如果这样可以得到一个好的结果，则将其从文本中删除。

PHP干净无法点击的网址，但保持img

3 个答案: