从文本中提取链接

时间:2014-09-08 05:42:39

标签: java regex

我想从以" q = http"开头的文本中提取链接。之前"& amp"即我想得到这两者之间的文本,我的输出应该像

http://ibnlive.in.com/news/noidas-nithari-killings-sc-stays-execution-of-convict-surinder-koli-for-a-week/497153-3-242.html

来自" http"的统计数据并且在"& amp"之前结束 我试过的是

        Pattern p =  Pattern.compile(".*?q=(http:.*?)&amp");
        Matcher m = p.matcher(content);
        String pageid = "";
        if (m.find())
        {
             pageid = m.group(1);

        }         
                System.out.println(pageid);

下面是我的文件text.html的内容......

 q=http://www.thehindu.com/news/national/supreme-court-execution-of-nithari-killer-surinder-koli/article6390120.ece&amp;sa=U&amp;ei=qTUNVOalHMe2uATE_YGQDw&amp;ved=0CBQQpwI&amp;usg=AFQjCNFDcbVK87iUjDwI21jbIZUg0aU8gQ"><img class="th" height="100" src="http://t1.gstatic.com/images?q=tbn:ANd9GcSUUkUw1JxXWJQj2SCQr3XxoIcY5OpWLzDgHIqvLzDgmrfntT9nRi99Lvuuheh05L50VDbs-pY" width="100" border="1"><br></a><span class="_pJb _yhd">The Hindu</span></td><td valign="top"><div style="margin-top:5px"><a href="/url?q=http://ibnlive.in.com/news/noidas-nithari-killings-sc-stays-execution-of-convict-surinder-koli-for-a-week/497153-3-242.html&amp;sa=U&amp;ei=qTUNVOalHMe2uATE_YGQDw&amp;ved=0CBYQqQIwAA&amp;usg=AFQjCNFPPZfQCJQH3vFo1I9Avu-ug8EcSg">Noida&#39;s <b>Nithari killings</b>: SC stays execution of convict Surinder Koli for a week</a><div style="padding-top:2px"><cite>IBNLive</cite><span class="f"> - <span class="nobr">1 hour ago</span><span class="nobr"></span></span></div><div class="j" style="margin-top:1px;margin-bottom:4px"><span class="st">New Delhi: The Supreme Court has stayed the execution of Noida&#39;s <b>Nithari</b> <b>killings</b> convict Surinder Koli for one week. An official of the apex&nbsp;...</span></div></div><div style="margin-top:4px"><a href="/url?q=http://www.firstpost.com/india/sc-defers-nithari-killings-convict-surendra-kolis-hanging-week-1701475.html&amp;sa=U&amp;ei=qTUNVOalHMe2uATE_YGQDw&amp;ved=0CBgQqQIwAQ&amp;usg=AFQjCNGsfNy0HC_rfyMfPSSpU66FmUydIw">SC defers <b>Nithari killings</b> convict Surendra Koli&#39;s hanging by a week</a><div style="padding-top:2px"><cite>Firstpost</cite><span class="f"> - <span class="nobr">1 hour ago</span><span class="nobr"></span></span></div></div><div style="margin-top:4px"><a href="/url?q=http://www.hindustantimes.com/india-news/nithari-killer-surinder-koli-to-be-hanged-on-sept-12-jail-authorities/article1-1260116.aspx&amp;sa=U&amp;ei=qTUNVOalHMe2uATE_YGQDw&amp;ved=0CBoQqQIwAg&amp;usg=AFQjCNEjeDnCXSwCCWtXO87tIhj6athCCA"><b>Nithari</b> case: Surinder Koli to be hanged on Sept 12</a><div style="padding-top:2px"><cite>Hindustan Times</cite><span class="f"> - <span class="nobr">3 days ago</span><span class="nobr"></span></span></div></div></td></tr></table></div></li><li class="g"><h3 class="r"><a href="/url?q=http://en.wikipedia.org/wiki/Noida_serial_murders&amp;sa=U&amp;ei=qTUNVOalHMe2uATE_YGQDw&amp;ved=0CBwQFjAD&amp;usg=AFQjCNGATKIaCWR1Hl-yqEqXcb1XnXKu9g">Noida serial <b>murders</b> - Wikipedia, the free encyclopedia</a></h3><div class="s"><div class="kv" style="margin-bottom:2px"><cite>en.wikipedia.org/wiki/Noida_serial_<b>murders</b></cite><div class="_nBb">‎<div style="display:inline" onclick="google.sham(this);" aria-expanded="false" aria-haspopup="true" tabindex="0" data-ved="0CB0Q7B0wAw"><span class="_O0"></span></div><div style="display:none" class="am-dropdown-menu" role="menu" tabindex="-1"><ul><li class="_Ykb"><a class="_Zkb" href="/url?q=http://webcache.googleusercontent.com/search%3Fhl%3Den-IN%26q%3Dcache:ITALXEhw0j8J:http://en.wikipedia.org/wiki/Noida_serial_murders%252Bnithari%2Bkillings%2Bnews%26gbv%3D2%26%26ct%3Dclnk&amp;sa=U&amp;ei=qTUNVOalHMe2uATE_YGQDw&amp;ved=0CB8QIDAD&amp;usg=AFQjCNFx4v82ZSgfuIZHJmenK1Xv6jxYpw">Cached</a></li><li class="_Ykb"><a class="_Zkb" href="/search?hl=en-IN&amp;gbv=2&amp;q=related:en.wikipedia.org/wiki/Noida_serial_murders+nithari+killings%09news&amp;tbo=1&amp;sa=X&amp;ei=qTUNVOalHMe2uATE_YGQDw&amp;ved=0CCAQHzAD">Similar</a></li></ul></div></div></div><span class="st">The Noida serial <b>murders</b> (also <b>Nithari</b> serial <b>murders</b>, <b>Nithari</b> Kand) took ... The <br>
police then sealed the house and did not allow <b>news</b> media anywhere near the&nbsp;...</span><br><div class="osl">‎<a href="/url?q=http://en.wikipedia.org/wiki/Noida_serial_murders%23Events_leading_to_primary_investigation&amp;sa=U&amp;ei=qTUNVOalHMe2uATE_YGQDw&amp;ved=0CCIQ0gIoADAD&amp;usg=AFQjCNFoFLoEv_CGAkKNe2WFNpQdqTyRag">Events leading to primary ...</a> - ‎<a href="/url?q=http://en.wikipedia.org/wiki/Noida_serial_murders%23Primary_investigation&amp;sa=U&amp;ei=qTUNVOalHMe2uATE_YGQDw&amp;ved=0CCMQ0gIoATAD&amp;usg=AFQjCNFFSz2pBFdWoUAGkp2sZ_KpAmBoUg">Primary investigation</a> - ‎<a href="/url?q=http://en.wikipedia.org/wiki/Noida_serial_murders%23CBI_investigation&amp;sa=U&amp;ei=qTUNVOalHMe2uATE_YGQDw&amp;ved=0CCQQ0gIoAjAD&amp;usg=AFQjCNHlovBCPUSSGlExpuZHJxtDHQUZ7A">CBI investigation</a> - ‎<a href="/url?q=http://en.wikipedia.org/wiki/Noida_serial_murders%23Victims&amp;sa=U&amp;ei=qTUNVOalHMe2uATE_YGQDw&amp;ved=0CCUQ0gIoAzAD&amp;usg=AFQjCNFgkfguy3vUxvh8JmS-ncgfIxOLNA">Victims</a></div></div></li><li class="g"><h3 class="r"><a href="/url?q=http://www.ndtv.com/topic/nithari-killings&amp;sa=U&amp;ei=qTUNVOalHMe2uATE_YGQDw&amp;ved=0CCcQFjAE&amp;usg=AFQjCNESpfXGZ4DE-uVDo8LvQ42kHVU4Bg"><b>Nithari Killings</b>: Latest <b>News</b>, Photos, Videos on <b>Nithari</b> <b>...</b> - NDTV.com</a></h3><div class="s"><div class="kv" style="margin-bottom:2px"><cite>www.ndtv.com/topic/<b>nithari</b>-<b>killings</b></cite><div class="_nBb">‎<div style="display:inline" onclick="google.sham(this);" aria-expanded="false" aria-haspopup="true" tabindex="0" data-ved="0CCgQ7B0wBA"><span class="_O0"></span></div><div style="display:none" class="am-dropdown-menu" role="menu" tabindex="-1"><ul><li class="_Ykb"><a class="_Zkb" href="/url?q=http://webcache.googleusercontent.com/search%3Fhl%3Den-IN%26q%3Dcache:a6vXEobpypEJ:http://www.ndtv.com/topic/nithari-killings%252Bnithari%2Bkillings%2Bnews%26gbv%3D2%26%26ct%3Dclnk&amp;sa=U&amp;ei=qTUNVOalHMe2uATE_YGQDw&amp;ved=0CCoQIDAE&amp;usg=AFQjCNGiRwJ84qtiMaU-6ag_SMMyugi2-g">Cached</a></li><li class="_Ykb"><a class="_Zkb" href="/search?hl=en-IN&amp;gbv=2&amp;q=related:www.ndtv.com/topic/nithari-killings+nithari+killings%09news&amp;tbo=1&amp;sa=X&amp;ei=qTUNVOalHMe2uATE_YGQDw&amp;ved=0CCsQHzAE">Similar</a></li></ul></div></div></div><span class="st">Find <b>Nithari Killings</b> Latest <b>News</b>, Videos &amp; Pictures on <b>Nithari Killings</b> and see <br>
latest updates, <b>news</b>, information from NDTV.COM. Explore more on <b>Nithari</b>&nbsp;...</span><br></div></li><li class="g"><h3 class="r"><a href="/url?q=http://timesofindia.indiatimes.com/India/SC-stays-execution-of-Nithari-killer-Surinder-Koli/articleshow/41998225.cms&amp;sa=U&amp;ei=qTUNVOalHMe2uATE_YGQDw&amp;ved=0CC0QFjAF&amp;usg=AFQjCNHAHTcrXEtqeYz_1KPmz-6AxK93RA">SC stays execution of <b>Nithari</b> killer Surinder Koli - The Times of India</a></h3><div class="s"><div class="kv" style="margin-bottom:2px"><cite>timesofindia.indiatimes.com/India/SC...<b>Nithari</b>.../41998225.cms</cite><div class="_nBb">‎<div style="display:inline" onclick="google.sham(this);" aria-expanded="false" aria-haspopup="true" tabindex="0" data-ved="0CC4Q7B0wBQ"><span class="_O0"></span></div><div style="display:none" class="am-dropdown-menu" role="menu" tabindex="-1"><ul><li class="_Ykb"><a class="_Zkb" href="/url?q=http://webcache.googleusercontent.com/search%3Fhl%3Den-IN%26q%3Dcache:OvKjSrI26NwJ:http://timesofindia.indiatimes.com/India/SC-stays-execution-of-Nithari-killer-Surinder-Koli/articleshow/41998225.cms%252Bnithari%2Bkillings%2Bnews%26gbv%3D2%26%26ct%3Dclnk&amp;sa=U&amp;ei=qTUNVOalHMe2uATE_YGQDw&amp;ved=0CDAQIDAF&amp;usg=AFQjCNHV4qIIoJ8sR79KTPOCIyUhWNwwCg">Cached</a></li></ul></div>

3 个答案:

答案 0 :(得分:1)

只需使用lookbehind来查看字符串q=http:并在字符串&amp

之前查看
(?<=q=http:).*?(?=\\s*&amp)

DEMO

String s = "q=http://ibnlive.in.com/news/noidas-nithari-killings-sc-stays-execution-of-convict-surinder-koli-for-a-week/497153-3-242.html &amp 1 hour agoNoida's Nithari killings: SC stays execution of convict Surinder Koli for a weekIBNLive - 1 hour agoNew Delhi: The Supreme Court has stayed the execution of Noida's Nithari killings convict\n"
        + " Surinder Koli for one week.\n"
        + "q=http://www.hindustantimes.com/india-news/nithari-killer-surinder-koli-to-be-hanged-on-sept-12-jail-authorities/article1-1260116.aspx &amp\n";

Pattern regex = Pattern.compile("(?<=q=http:).*?(?=\\s*&amp)", Pattern.MULTILINE);
 Matcher matcher = regex.matcher(s);
 while(matcher.find()){
        System.out.println(matcher.group(0));
}

输出:

//ibnlive.in.com/news/noidas-nithari-killings-sc-stays-execution-of-convict-surinder-koli-for-a-week/497153-3-242.html
//www.hindustantimes.com/india-news/nithari-killer-surinder-koli-to-be-hanged-on-sept-12-jail-authorities/article1-1260116.aspx

这只会匹配q=之前的链接,并以http:开头

String s = "q=http://ibnlive.in.com/news/noidas-nithari-killings-sc-stays-execution-of-convict-surinder-koli-for-a-week/497153-3-242.html &amp 1 hour agoNoida's Nithari killings: SC stays execution of convict Surinder Koli for a weekIBNLive - 1 hour agoNew Delhi: The Supreme Court has stayed the execution of Noida's Nithari killings convict\n"
        + " Surinder Koli for one week.\n"
        + "q=http://www.hindustantimes.com/india-news/nithari-killer-surinder-koli-to-be-hanged-on-sept-12-jail-authorities/article1-1260116.aspx &amp\n";

Pattern regex = Pattern.compile("(?<=q=)http:.*?(?=\\s*&amp)", Pattern.MULTILINE);
 Matcher matcher = regex.matcher(s);
 while(matcher.find()){
        System.out.println(matcher.group(0));
}

输出:

http://ibnlive.in.com/news/noidas-nithari-killings-sc-stays-execution-of-convict-surinder-koli-for-a-week/497153-3-242.html
http://www.hindustantimes.com/india-news/nithari-killer-surinder-koli-to-be-hanged-on-sept-12-jail-authorities/article1-1260116.aspx

答案 1 :(得分:1)

另一种选择:

q=(http(?>[^&]+|&(?!amp))*)

然后捕获group1

Demo

答案 2 :(得分:0)

.*?q=(http:.*?)&amp

试试这个。这给了所有的比赛。

见这里。

http://regex101.com/r/iX5xR2/9