如何通过正则表达式删除URL的某些部分?

时间:2013-10-08 17:47:50

标签: java regex string

我有这样的完整链接:

http://localhost:8080/suffix/rest/of/link

如何在Java中编写正则表达式,它只返回带有后缀的网址的主要部分:http://localhost/suffix而没有:/rest/of/link

  • 可能的协议:http,https
  • 可能的端口:很多可能性

我假设我需要在第3次出现'/'标记(包括)后删除整个文本。 我想这样做,但我不知道正则表达式,你能帮忙请问如何正确编写正则表达式吗?

String appUrl = fullRequestUrl.replaceAll("(.*\\/{2})", ""); //this removes 'http://' but this is not my case

2 个答案:

答案 0 :(得分:3)

我不确定为什么要使用正则表达式。 Java为您提供了 Query URL Objects

以下示例摘自同一site以展示其工作原理:

import java.net.*;
import java.io.*;

public class ParseURL {
    public static void main(String[] args) throws Exception {

        URL aURL = new URL("http://example.com:80/docs/books/tutorial"
                           + "/index.html?name=networking#DOWNLOADING");

        System.out.println("protocol = " + aURL.getProtocol());
        System.out.println("authority = " + aURL.getAuthority());
        System.out.println("host = " + aURL.getHost());
        System.out.println("port = " + aURL.getPort());
        System.out.println("path = " + aURL.getPath());
        System.out.println("query = " + aURL.getQuery());
        System.out.println("filename = " + aURL.getFile());
        System.out.println("ref = " + aURL.getRef());
    }
}

以下是程序显示的输出:

protocol = http
authority = example.com:80
host = example.com
port = 80
path = /docs/books/tutorial/index.html
query = name=networking
filename = /docs/books/tutorial/index.html?name=networking
ref = DOWNLOADING

答案 1 :(得分:1)

代码获取URL的主要部分:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexpExample {
    public static void main(String[] args) {
        String urlStr  = "http://localhost:8080/suffix/rest/of/link";
        Pattern pattern = Pattern.compile("^((.*:)//([a-z0-9\\-.]+)(|:[0-9]+)/([a-z]+))/(.*)$");

        Matcher matcher = pattern.matcher(urlStr);
        if(matcher.find())
        {
            //there is a main part of url with suffix:
            String mainPartOfUrlWithSuffix = matcher.group(1);
            System.out.println(mainPartOfUrlWithSuffix);
        }
    }
}