鉴于此类域名:
http%3a%2f%2fwww.google.com%2fpagead%2fconversion%2f1001680686%2f%3flabel%3d4dahCKKczAYQrt7R3QM%26value%3d%26muid%3d_0RQqV8nf-ENh3b4qRJuXQ%26bundleid%3dcom.google.android.youtube%26appversion%3d5.10
我想替换
%3a%2f%2
带
://
并删除“.com”背后的所有内容,所以最后我只想得到
http://www.google.com
如何使用正则表达式在Java中实现此功能?
答案 0 :(得分:2)
您可以使用:
String u = URLDecoder.decode(url, "UTF-8").replaceFirst("(\\.[^/]+).*$", "$1");
// http://www.google.com
答案 1 :(得分:1)
因此,在解码后,您可以获得此方案的网址(例如,使用java.net.URLDecoder.decode()
):
http://www.google.com/here/is/some/content
要从输入中获取域和协议,您可以使用这样的正则表达式:
String input = URLDecoder.decode("http%3a%2f%2fwww.google.com%2fpagead%2fconversion%2f1001680686%2f%3flabel%3d4dahCKKczAYQrt7R3QM%26value%3d%26muid%3d_0RQqV8nf-ENh3b4qRJuXQ%26bundleid%3dcom.google.android.youtube%26appversion%3d5.10");
Matcher m = Pattern.compile("(http[s]?)://([^/]+)(/.*)?").matcher(input);
if (!m.matches()) return;
String protocol = m.group(1);
String domain = m.group(2);
System.out.println(protocol + "://" + domain);
正则表达式的解释:
(http[s]?)://([^/]+)(/.*)?
|---1----|-2-|--3--|--4---|
http
和https
[^/]+
是任何不包含斜杠的字符串)答案 2 :(得分:0)
单向;
java.net.URI uri = new java.net.URI(java.net.URLDecoder.decode(url, "UTF-8"));
System.out.println( uri.getScheme() + "://" + uri.getHost() );