Question

我想使用Java正则表达式来匹配网址的域名，例如，对于www.table.google.com，我想从网址中获取'google'，即此网址字符串中的倒数第二个字。

任何帮助将不胜感激!!!

Answer 1

这实际上取决于您输入的复杂程度......

这是一个非常简单的正则表达式：

.+\\.(.+)\\..+

它取出点\\.内的东西。

以下是该模式的一些示例：https://regex101.com/r/L52oz6/1。正如您所看到的，它适用于简单的输入，但不适用于复杂的URL。

但是为什么要重新发明轮子，有很多非常好的库可以正确地解析任何复杂的URL。但可以肯定的是，对于简单的输入，可以轻松构建小的正则表达式。因此，如果这不能解决您输入的问题，那么请回调，然后我将调整正则表达式模式。

请注意，您也可以使用简单的拆分：

String[] elements = input.split("\\.");
String secondToLastElement = elements[elements.length - 2];

但不要忘记索引限制检查。

或者，如果您搜索一个非常快速的解决方案，而不是从最后一个位置开始遍历输入。一直工作直到找到第一个点，一直持续到找到第二个点。然后使用input.substring(index1, index2);提取该部分。

对于这个目的，还有一个委托方法，即String#lastIndexOf（参见documentation）。

看一下这段代码：

String input = ...
int indexLastDot = input.lastIndexOf('.');
int indexSecondToLastDot = input.lastIndexOf('.', indexLastDot);
String secondToLastWord = input.substring(indexLastDot, indexSecondToLastDot);

也许这些界限已经过了1，没有对代码进行测试，但是你明白了。也不要忘记绑定检查。

这种方法的优点是它非常快，它可以直接在String的内部结构上工作，而无需创建副本。

Answer 2

我的尝试：

(?<scheme>https?:\/\/)?(?<subdomain>\S*?)(?<domainword>[^.\s]+)(?<tld>\.[a-z]+|\.[a-z]{2,3}\.[a-z]{2,3})(?=\/|$)

Demo。适用于：

http://www.foo.stackoverflow.com
http://www.stackoverflow.com
http://www.stackoverflow.com/
http://stackoverflow.com
https://www.stackoverflow.com
www.stackoverflow.com
stackoverflow.com
http://www.stackoverflow.com
http://www.stackoverflow.co.uk
foo.www.stackoverflow.com
foo.www.stackoverflow.co.uk
foo.www.stackoverflow.co.uk/a/b/c

Answer 3

private static final Pattern URL_MATCH_GET_SECOND_AND_LAST = 
        Pattern.compile("www.(.*)//.google.(.*)", Pattern.CASE_INSENSITIVE);

String sURL = "www.table.google.com";

if (URL_MATCH_GET_SECOND_AND_LAST.matcher(sURL).find()){

    Matcher matchURL =  URL_MATCH_GET_SECOND_AND_LAST .matcher(sURL);

    if (matchURL .find()) {
        String sFirst = matchURL.group(1);
        String sSecond= matchURL.group(2);
    }
}

Java Regexp匹配url的域

3 个答案: