我的输入String
为:
String text = "Some content which contains link as <A HREF=\"/relative-path/fruit.cgi?param1=abc&param2=xyz\">URL Label</A> and some text after it";
我想将此文本转换为:
Some content which contains link as http://www.google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&myParam=pqr (URL Label) and some text after it
所以在这里:
1)我想用纯链接替换链接标签。如果标签包含标签,则应在URL后使用大括号。
2)如果该URL是相对URL,我想在基本URL(http://www.google.com)前面加上前缀。
3)我想向URL附加参数。 (&myParam = pqr)
我在用URL和标签检索标签并替换它时遇到问题。
我写了类似的东西:
public static void main(String[] args) {
String text = "String text = "Some content which contains link as <A HREF=\"/relative-path/fruit.cgi?param1=abc&param2=xyz\">URL Label</A> and some text after it";";
text = text.replaceAll("<", "<");
text = text.replaceAll(">", ">");
text = text.replaceAll("&", "&");
// this is not working
Pattern p = Pattern.compile("href=\"(.*?)\"");
Matcher m = p.matcher(text);
String url = null;
if (m.find()) {
url = m.group(1);
}
}
// helper method to append new query params once I have the url
public static URI appendQueryParams(String uriToUpdate, String queryParamsToAppend) throws URISyntaxException {
URI oldUri = new URI(uriToUpdate);
String newQueryParams = oldUri.getQuery();
if (newQueryParams == null) {
newQueryParams = queryParamsToAppend;
} else {
newQueryParams += "&" + queryParamsToAppend;
}
URI newUri = new URI(oldUri.getScheme(), oldUri.getAuthority(),
oldUri.getPath(), newQueryParams, oldUri.getFragment());
return newUri;
}
编辑1:
Pattern p = Pattern.compile("HREF=\"(.*?)\"");
这有效。但是,我希望它与大写无关。 Href,HRef,href,hrEF等都应该起作用。
此外,如果我的文本有多个URL,该如何处理。
编辑2:
一些进步。
Pattern p = Pattern.compile("href=\"(.*?)\"");
Matcher m = p.matcher(text);
String url = null;
while (m.find()) {
url = m.group(1);
System.out.println(url);
}
这处理多个URL的情况。
最后一个未解决的问题是,如何获得标签并将原始文本中的href标签替换为URL和标签。
Edit3:
对于多种URL,我的意思是给定文本中存在多个URL。
String text = "Some content which contains link as <A HREF=\"/relative-path/fruit.cgi?param1=abc&param2=xyz\">URL Label</A> and some text after it and another link <A HREF=\"/relative-path/vegetables.cgi?param1=abc&param2=xyz\">URL2 Label</A> and some more text";
Pattern p = Pattern.compile("href=\"(.*?)\"", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
String url = null;
while (m.find()) {
url = m.group(1); // this variable should contain the link URL
url = appendBaseURI(url);
url = appendQueryParams(url, "license=ABCXYZ");
System.out.println(url);
}
答案 0 :(得分:1)
您可以使用apache commons text StringEscapeUtils
来解码html实体,然后使用replaceAll
进行解码,即:
import org.apache.commons.text.StringEscapeUtils;
String text = "Some content which contains link as <A HREF=\"/relative-path/fruit.cgi?param1=abc&param2=xyz\">URL Label</A> and some text after it";
String output = StringEscapeUtils.unescapeHtml4(text).replaceAll("([^<]+).+\"(.*?)\">(.*?)<[^>]+>(.*)", "$1https://google.com$2&your_param ($3)$4");
System.out.print(output);
// Some content which contains link as https://google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&your_param (URL Label) and some text after it
演示:
答案 1 :(得分:1)
public static void main(String args[]) {
String text = "Some content which contains link as <A HREF=\"/relative-path/fruit.cgi?param1=abc&param2=xyz\">URL Label</A> and some text after it and another link <A HREF=\"/relative-path/vegetables.cgi?param1=abc&param2=xyz\">URL2 Label</A> and some more text";
text = StringEscapeUtils.unescapeHtml4(text);
Pattern p = Pattern.compile("<a href=\"(.*?)\">(.*?)</a>", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
while (m.find()) {
text = text.replace(m.group(0), cleanUrlPart(m.group(1), m.group(2)));
}
System.out.println(text);
}
private static String cleanUrlPart(String url, String label) {
if (!url.startsWith("http") && !url.startsWith("www")) {
if (url.startsWith("/")) {
url = "http://www.google.com" + url;
} else {
url = "http://www.google.com/" + url;
}
}
url = appendQueryParams(url, "myParam=pqr").toString();
if (label != null && !label.isEmpty()) url += " (" + label + ")";
return url;
}
输出
Some content which contains link as http://www.google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&myParam=pqr (URL Label) and some text after it and another link http://www.google.com/relative-path/vegetables.cgi?param1=abc¶m2=xyz&myParam=pqr (URL2 Label) and some more text
答案 2 :(得分:0)
//这不起作用
因为正则表达式区分大小写。
尝试:-
Pattern p = Pattern.compile("href=\"(.*?)\"", Pattern.CASE_INSENSITIVE);
Edit1 :
要获取标签,请使用Pattern.compile("(?<=>).*?(?=</a>)", Pattern.CASE_INSENSITIVE)
和m.group(0)
。
Edit2 :
要将标签(包括标签)替换为您的最终字符串,请使用:-
text.replaceAll("(?i)<a href=\"(.*?)</a>", "new substring here")
答案 3 :(得分:0)
几乎在那里:
public static void main(String[] args) throws URISyntaxException {
String text = "Some content which contains link as <A HREF=\"/relative-path/fruit.cgi?param1=abc&param2=xyz\">URL Label</A> and some text after it and another link <A HREF=\"/relative-path/vegetables.cgi?param1=abc&param2=xyz\">URL2 Label</A> and some more text";
text = StringEscapeUtils.unescapeHtml4(text);
System.out.println(text);
System.out.println("**************************************");
Pattern patternTag = Pattern.compile("<a([^>]+)>(.+?)</a>", Pattern.CASE_INSENSITIVE);
Pattern patternLink = Pattern.compile("href=\"(.*?)\"", Pattern.CASE_INSENSITIVE);
Matcher matcherTag = patternTag.matcher(text);
while (matcherTag.find()) {
String href = matcherTag.group(1); // href
String linkText = matcherTag.group(2); // link text
System.out.println("Href: " + href);
System.out.println("Label: " + linkText);
Matcher matcherLink = patternLink.matcher(href);
String finalText = null;
while (matcherLink.find()) {
String link = matcherLink.group(1);
System.out.println("Link: " + link);
finalText = getFinalText(link, linkText);
break;
}
System.out.println("***************************************");
// replacing logic goes here
}
System.out.println(text);
}
public static String getFinalText(String link, String label) throws URISyntaxException {
link = appendBaseURI(link);
link = appendQueryParams(link, "myParam=ABCXYZ");
return link + " (" + label + ")";
}
public static String appendQueryParams(String uriToUpdate, String queryParamsToAppend) throws URISyntaxException {
URI oldUri = new URI(uriToUpdate);
String newQueryParams = oldUri.getQuery();
if (newQueryParams == null) {
newQueryParams = queryParamsToAppend;
} else {
newQueryParams += "&" + queryParamsToAppend;
}
URI newUri = new URI(oldUri.getScheme(), oldUri.getAuthority(),
oldUri.getPath(), newQueryParams, oldUri.getFragment());
return newUri.toString();
}
public static String appendBaseURI(String url) {
String baseURI = "http://www.google.com/";
if (url.startsWith("/")) {
url = url.substring(1, url.length());
}
if (url.startsWith(baseURI)) {
return url;
} else {
return baseURI + url;
}
}