如何使用正则表达式匹配没有顶级域名的链接?

时间:2014-05-30 12:02:12

标签: javascript regex

我使用下一个正则表达式(linkify regex的更新版本)来匹配链接并且不匹配电子邮件。

(\s*|[^a-zA-Z0-9.\+_\/"\>\-]|^)(?:([a-zA-Z0-9\+_\-]+(?:\.[a-zA-Z0-9\+_\-]+)*@)?(http:\/\/|https:\/\/|ftp:\/\/|scp:\/\/){1}?((?:(?:[a-zA-Z0-9][a-zA-Z0-9_%\-_+]*\.)+))(?:[a-zA-Z]{2,})((?::\d{1,5}))?((?:[\/|\?](?:[\-a-zA-Z0-9_%#*&+=~!?,;:.\/]*)*)[\-\/a-zA-Z0-9_%#*&+=~]|\/?)?)([^a-zA-Z0-9\+_\/"\<\-]|$)

但是这个正则表达式找不到像https://someurl:3333/view/something

这样的网址

你能帮我解决这个问题吗?谢谢!

1 个答案:

答案 0 :(得分:1)

这应该是表达式的“最少修改”版本,以匹配没有顶级的域:

(\s*|[^a-zA-Z0-9.\+_\/"\>\-]|^)(?:([a-zA-Z0-9\+_\-]+(?:\.[a-zA-Z0-9\+_\-]+)*@)?(http:\/\/|https:\/\/|ftp:\/\/|scp:\/\/){1}?((?:[a-zA-Z0-9][a-zA-Z0-9_%\-_+.]*)(?:\.[a-zA-Z]{2,})?)((?::\d{1,5}))?((?:[\/|\?](?:[\-a-zA-Z0-9_%#*&+=~!?,;:.\/]*)*)[\-\/a-zA-Z0-9_%#*&+=~]|\/?)?)([^a-zA-Z0-9\+_\/"\<\-]|$)

更改的部分是捕获组3,抓取域的那个。它来自:

(
 (?:
  (?:
   [a-zA-Z0-9]
   [a-zA-Z0-9_%\-_+]*
   \.
  )+                  (?# this is how they repeated for optional subdomains)
 )
)
(?:
 [a-zA-Z]{2,}         (?# here is the mandatory TLD)
)

对此:

(
 (?:
  [a-zA-Z0-9]
  [a-zA-Z0-9_%\-_+.]* (?# the . is in the character class here for subdomains)
 )
 (?:
  \.
  [a-zA-Z]{2,}
 )?                   (?# this TLD is optional)
)

Demo