除了此格式的网址外,我还有一些内容包含文字:
some text http://www.example.com/foo... <http://www.example.com/foo.html> some text
或此格式:
some text http://www.example.com/bar <http://www.example.com/bar> some text
我需要分别清理它:
some text http://www.example.com/foo.html some text
和
some text http://www.example.com/bar some text
有没有办法用正则表达式来实现这个目的?
答案 0 :(得分:1)
我通过using backreferences to match the same text again解决了问题:
input.replaceAll("(https?://([^ ]+))(\\.{3,3})? *<(\\1[^ ]+)>", "$4")