Question

我找到了一种自动检测链接并将其放在<a>标记中的解决方案：Regex PHP - Auto-detect YouTube, image and "regular" links

相关部分（出于兼容性原因，我不得不在preg_replace_callback调用之外移动该函数）：

function put_url_in_a($arr)
    {
    if(strpos($arr[0], 'http://') !== 0)
        {
            $arr[0] = 'http://' . $arr[0];
        }
        $url = parse_url($arr[0]);

        //links
        return sprintf('<a href="%1$s">%1$s</a>', $arr[0]);
    }

$s = preg_replace_callback('#(?:https?://\S+)|(?:www.\S+)|(?:\S+\.\S+)#', 'put_url_in_a', $s);

这样可以正常工作，除非它偶然发现标签中的网址，然后将其标记为废弃（通过在其中添加另一个标记）。它也破坏了嵌入式媒体。

问题：如何使用此函数排除HTML标记，希望只使用正则表达式？

Answer 1

一个选项 - 如果网址已在链接中，则必须以href='为前缀，因此请排除negative lookbehind断言的链接：

#(?<!href\=['"])(?:https?://\S+)|(?:www.\S+)|(?:\S+\.\S+)#

编辑： - 实际上上述表单不起作用，因为网址匹配过于笼统，它会将...之类的内容转换为链接，不正确。使用我自己喜欢的URL匹配方案似乎正常工作：

$s = preg_replace_callback('#(?<!href\=[\'"])(https?|ftp|file)://[-A-Za-z0-9+&@\#/%()?=~_|$!:,.;]*[-A-Za-z0-9+&@\#/%()=~_|$]#', 'regexp_url_search', $s);

例如：http://codepad.viper-7.com/TukPdY

$s = "The following link should be linkified: http://www.google.com but not this one: <a href='http://www.google.com'>google</a>."`

变为：

The following link should be linkified: <a href="http://www.google.com">http://www.google.com</a> but not this one: <a href='http://www.google.com'>google</a>.

php - 自动检测链接并将其放入<a> tag, except when they are already in an html tag</a>

1 个答案: