Question

这是here上另一篇文章的后续内容。

问题：下面的代码效果很好，但包含双引号的字符串除外，它会呈现奇怪的字符

示例字符串：

“Walter Isaacson http://t.co/vaLxVduA”

呈现为：

“Walter Isaacson http://t.co/vaLxVduA���

t.co/vaLxVduA���

我认为问题在于正则表达式。我可以尝试做些什么呢？

代码：

function makeLink($match) {
    // Parse link.
     $substr = substr($match, 0, 6);
     if ($substr != 'http:/' && $substr != 'https:' && $substr != 'ftp://' && $substr != 'news:/' && $substr != 'file:/') {
        $url = 'http://' . $match;
     } else {
        $url = $match;
     }

     return '<a href="' . $url . '">' . $match . '</a>';
}
function makeHyperlinks($text) {
    // Find links and call the makeLink() function on them.
    return preg_replace('/((www\.|http|https|ftp|news|file):\/\/[\w.-]+\.[\w\/:@=.+?,#%&~-]*[^.\'# !(?,><;\)])/e', "makeLink('$1')", $text);
}

Answer 1

问题是死于unicode字符”。当您添加u modifier时，要将每个字符串视为UTF-8，它会起作用，但也会将引用作为URL的一部分捕获。您还需要排除此引用：

preg_replace('/((www\.|http|https|ftp|news|file):\/\/[\w.-]+\.[\w\/:@=.+?,#%&~-]*[^.\'# !(?,>”<;\)])/eu', "makeLink('$1')", $text);

但你的正则表达式看起来有点大，我快速搜索了一个URL正则表达式和found this一个，它似乎也有效，并且不需要所有排除

preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@eu', "makeLink('$1')", $text);

使用正则表达式转义字符串中的双引号

1 个答案: