Question

我需要使用任何协议（http，https，shttp，ftp，svn，mysql以及我不知道的事情）来测试常规URL。

我的第一步是：

\w+://(\w+\.)+[\w+](/[\w]+)(\?[-A-Z0-9+&@#/%=~_|!:,.;]*)?

（PCRE和.NET所以没什么好看的）

Answer 1

根据RFC2396：

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

Answer 2

将RegEx添加为维基回答：

[\w+-]+://([a-zA-Z0-9]+\.)+[[a-zA-Z0-9]+](/[%\w]+)(\?[-A-Z0-9+&@#/%=~_|!:,.;]*)?

选项2（重新CMS）

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

但是这样做对于任何理智的东西来说都是松懈的，以使其更具限制性并区别于其他事物。

proto      ://  name      : pass      @  server    :port      /path     ? args
^([^:/?#]+)://(([^/?#@:]+(:[^/?#@:]+)?@)?[^/?#@:]+(:[0-9]+)?)(/[^?#]*)(\?([^#]*))?

Answer 3

我从一个略微不同的方向来到这里。我想模仿gchats匹配something.co.uk并将其链接起来的能力。所以我选择了一个正则表达式来寻找一个.而没有任何一个句号或两边的空格，然后抓住它周围的一切，直到它碰到空白。它确实匹配URI末尾的一段时间，但我稍后会把它关掉。因此，如果你更喜欢误报而忽略某些潜力，那么这可能是一个选择

url_re = re.compile(r"""
           [^\s]             # not whitespace
           [a-zA-Z0-9:/\-]+  # the protocol and domain name
           \.(?!\.)          # A literal '.' not followed by another
           [\w\-\./\?=&%~#]+ # country and path components
           [^\s]             # not whitespace""", re.VERBOSE) 

url_re.findall('http://thereisnothing.com/a/path adn some text www.google.com/?=query#%20 https://somewhere.com other-countries.co.nz. ellipsis... is also a great place to buy. But try text-hello.com ftp://something.com')

['http://thereisnothing.com/a/path',
 'www.google.com/?=query#%20',
 'https://somewhere.com',
 'other-countries.co.nz.',
 'text-hello.com',
 'ftp://something.com']

我需要一个regEx来匹配常规URL

3 个答案: