Question

我一直在尝试通过一个字符串来查找和替换带有链接的URL，这是我到目前为止所提出的，并且它似乎在很大程度上起作用，但是有一些事情我我想擦亮。此外，它可能不是表现最好的方式。

我已经在SO上阅读了很多关于这个问题的线程，尽管它有很大的帮助，但我还是需要把它放在一边。

我正在通过字符串两次。我第一次用html标签替换bbtags;第二次我在字符串中运行并用链接替换文本URL：

$body_str = preg_replace('/\[url=(.+?)\](.+?)\[\/url\]/i', '<a href="\1" rel="nofollow" target="_blank">\2</a>', $body_str);

$body_str = preg_replace_callback(
    '!(?:^|[^"\'])(http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?!',
    function ($matches) {
        return strpos(trim($matches[0]), 'thisone.com') == FALSE ?
        '<a href="' . ltrim($matches[0], " \t\n\r\0\x0B.,@?^=%&amp;:/~\+#'") . '" rel="nofollow" target="_blank">' . ltrim($matches[0], "\t\n\r\0\x0B.,@?^=%&amp;:/~\+#'") . '</a>' :
        '<a href="' . ltrim($matches[0], " \t\n\r\0\x0B.,@?^=%&amp;:/~\+#'") . '">' . ltrim($matches[0], "\t\n\r\0\x0B.,@?^=%&amp;:/~\+#'") . '</a>';
    },
    $body_str
);

到目前为止，我发现的几个问题是它倾向于在'http'等之前立即拾取角色，例如一个空格/逗号/冒号等，打破了链接。因此，我使用preg_replace_callback来解决这个问题并修剪一些会破坏链接的不需要的字符。

另一个问题是，为了避免通过匹配已经在A-tags中的网址来断开链接，我目前不包括以引号开头的网址，双引号，我宁愿使用href ='| | href =“排除。

非常感谢任何提示和建议

Answer 1

首先，我允许自己重构一下你的代码，以便于阅读和修改：


function urltrim($str) {
   return ltrim($str, " \t\n\r\0\x0B.,@?^=%&:/~\+#'");
}
function addlink($str,$nofollow=true) {
        return '<a href="' . urltrim($str) . '"'.($nofollow ? ' rel="nofollow" target="_blank"' : '').'>' . urltrim($str) . '</a>';
}
function checksite($str) {
        return strpos(trim($str), 'thisone.com') == FALSE ?  addlink($str) : addlink($str,false);
}

$body_str = preg_replace('/\[url=(.+?)\](.+?)\[\/url\]/i', '\2', $body_str);

$body_str = preg_replace_callback(
    '!(?:^|[^"\'])(http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?!',
       function ($matches) {
        return checksite($matches[0]);
    },

    $body_str
);

之后我改变了处理链接的方式：

我认为网址是一个字（=所有字符，直到找到空格或\ n或\ t（= \ s））
我更改了匹配方法以匹配字符串前面的href =的存在
- 如果它存在，那么我什么都不做，它已经是一个链接
- 如果没有href =，则我替换链接
所以urltrim方法不再有用，因为我没有吃掉http之前的第一个字符
当然，我使用urlencode对网址进行编码并避免使用html注入

function urltrim($str) {
    return $str;
}
function addlink($str,$nofollow=true) {
        $url = preg_replace("#(https?)%3A%2F%2F#","$1://",urlencode(urltrim($str)));
        return '<a href="' . $url . '"'.($nofollow ? ' rel="nofollow" target="_blank"' : '').'>' . urltrim($str) . '</a>';
}
function checksite($str) {
        return strpos(trim($str), 'thisone.com') == FALSE ?  addlink($str) : addlink($str,false);
}

$body_str = preg_replace('/\[url=(.+?)\](.+?)\[\/url\]/i', '\2', $body_str);

$body_str = preg_replace_callback(
    '!(|href=)(["\']?)(https?://[^\s]+)!',
    function ($matches) {
        if ($matches[1]) {
            # If href= is present, dont do anything, return the original string
            return $matches[0];
        } else {
            # add the previous char (" or ') and the link
            return $matches[2].checksite($matches[3]);
        }
    },
    $body_str
);

我希望这可以帮助您完成项目。告诉我们它是否有帮助。

再见。

查找并替换文本blob中的URL，但不包括链接标记中的URL

1 个答案: