实现Web地址正则表达式

时间:2012-09-12 10:06:12

标签: php regex

我在网上发现了以下内容,但我无法实施

(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?

这就是我想要php做的事情:

请执行以下操作:Look here: http://www.rocketlanguages.com/spanish/resources/pronunciation_spanish_accents.php

并将其转换为:Look here: <a href="http://www.rocketlanguages.com/spanish/resources/pronunciation_spanish_accents.php">http://www.rocketlanguages.com/span...anish_accents.php</a>

如果网址很长,那么文本会被中间的<...

分解

2 个答案:

答案 0 :(得分:1)

试试这个:

// URL regex from here:
// http://daringfireball.net/2010/07/improved_regex_for_matching_urls
define( 'URL_REGEX', <<<'_END'
~(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))~
_END
);

// PHP 5.3 or higher, can use closures (anonymous functions)
function replace_urls_with_anchor_tags( $string,
                                        $length = 50,
                                        $elision_string = '...' ) {
    $replace_function = function( $matches ) use ( $length, $elision_string) {
        $matched_url = $matches[ 0 ];
        return '<a href="' . $matched_url . '">' .
                abbreviated_url( $matched_url, $length, $elision_string )   .
                '</a>';
    };
    return preg_replace_callback(
        URL_REGEX,
        $replace_function,
        $string
    );
}

function abbreviated_url( $url, $length = 50, $elision_string = '...' ) {
    if ( strlen( $url ) <= $length ) {
        return $url;
    }
    $width_either_side = (int) ( ( $length - strlen( $elision_string ) ) / 2 );
    $left  = substr( $url, 0, $width_either_side );
    $right = substr( $url, strlen( $url ) - $width_either_side );

    return $left . $elision_string . $right;
}

(URL_REGEX定义中的反引号混淆了stackoverflow.com的语法突出显示,但没有什么值得关注的)

函数replace_urls_with_anchor_tags接受一个字符串,并将匹配的所有URL更改为锚标记,通过省略省略号来缩短长URL。该函数采用可选的lengthelision_string参数,以防您想要使用长度并将省略号更改为其他内容。

这是一个用法示例:

// Test it out
$test = <<<_END
Look here:
http://www.rocketlanguages.com/spanish/resources/pronunciation_spanish_accents.php

And here:
http://stackoverflow.com/questions/12385770/implementing-web-address-regular-expression
_END;

echo replace_urls_with_anchor_tags( $test, 50, '...' );
// OUTPUT:
// Look here:
// <a href="http://www.rocketlanguages.com/spanish/resources/pronunciation_spanish_accents.php">http://www.rocketlangua...ion_spanish_accents.php</a>
//
// And here:
// <a href="http://stackoverflow.com/questions/12385770/implementing-web-address-regular-expression">http://stackoverflow.co...ress-regular-expression</a>

请注意,如果您使用的是PHP 5.2或更低版本,则必须重写replace_urls_with_anchor_tags以使用create_function而不是闭包。在PHP 5.3之前没有引入闭包:

// No closures in PHP 5.2, must use create_function()
function replace_urls_with_anchor_tags( $string,
                                        $length = 50,
                                        $elision_string = '...' ) {
    $replace_function = create_function(
        '$matches',
        'return "<a href=\"$matches[0]\">" .
                abbreviated_url( $matches[ 0 ], '            .
                                 $length  . ', '             .
                                 '"' . $elision_string . '"' .
                               ') . "</a>";'
    );
    return preg_replace_callback(
        URL_REGEX,
        $replace_function,
        $string
    );
}

请注意,我在他的评论中提到的DaveRandom页面上替换了您找到的URL正则表达式。它更完整,实际上你正在使用的正则表达式中存在一个错误 - 几个'/'字符没有被转义(在这里:[\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])。此外,它不会检测端口号,如80或8080。

希望这有帮助。

答案 1 :(得分:0)

我正在使用这个正则表达式,它对我来说很好,如果你想要试试这个

(http|https|ftp):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?