我有一个字符串,其中包含一些网址,我怎么能用正则表达式找到所有的href?
<a href="http://www.amazon.it/Die-10-Symphonien-Orchesterlieder-Sinfonie-Complete/dp/B003LQSHBO/ref=sr_1_2?ie=UTF8&qid=1440101590&sr=8-2&keywords=mahler">prodotto di prova</a>
现在我有这个找到所有亚马逊链接现在我需要添加href到这个正则表达式:
String regex="(http|www\\.)(amazon|AMAZON)\\.(com|it|uk|fr|de)\\/(?:gp\\/product|gp\\/product\\/glance|[^\\/]+\\/dp|dp|[^\\/]+\\/product-reviews)\\/([^\\/]{10})";
答案 0 :(得分:0)
这种模式适用于Java :( IDEONE here)
String input = "<a href=\"http://www.amazon.it/Die-10-Symphonien-Orchesterlieder-Sinfonie-Complete/dp/B003LQSHBO/ref=sr_1_2?ie=UTF8&qid=1440101590&sr=8-2&keywords=mahler\">prodotto di prova</a>\"";
String pattern = "href=(?<link>['\\\"](?:https?:\\/\\/)?(?:www\\.)?(?:amazon|AMAZON)\\.(?:com|it|uk|fr|de)\\/(?<product>:gp\\/product|gp\\/product\\/glance|[^\\/]+\\/dp|dp|[^\\/]+\\/product-reviews)\\/(?<productID>[^\\/]{10})\\/(?<queryString>.*?)\\\")";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(input);
if (m.find( )) {
System.out.println("Amazon link: " + m.group(0) );
System.out.println("product: " + m.group("product") );
System.out.println("productID: " + m.group("productID"));
System.out.println("querystring: " + m.group("queryString"));
} else {
System.out.println("NO MATCH");
}
输出:
亚马逊链接: HREF =&#34; HTTP://www.amazon.it/Die-10-Symphonien-Orchesterlieder-Sinfonie-Complete/dp/B003LQSHBO/ref=sr_1_2即= UTF8&安培; QID = 1440101590&安培; SR = 8-2&安培;关键字=马勒&#34;
产品:Die-10-Symphonien-Orchesterlieder-Sinfonie-Complete / dp
productID:B003LQSHBO
querystring:ref = sr_1_2?ie = UTF8&amp; qid = 1440101590&amp; sr = 8-2&amp; keywords = mahler
Java对字符串中的反斜杠和转义的规则对我来说绝对令人生气,我从来没有把它弄好。您可能会发现转到http://www.regexplanet.com/advanced/java/index.html并输入正则表达式会很有帮助,它将转换为带有正确转义符的java字符串。 (在我这样做之前,我无法工作!)