如何检测和删除一个句子的URL?

时间:2013-07-01 15:16:46

标签: java regex string parsing url

是否可以检测并删除句子中的任何类型的网址?

例如:

Today,wheather is cold.But I want to out. http://weathers.com..... And I will take a cup of tea...

应该成为

Today,wheather is cold.But I want to out. And I will take a cup of tea...

2 个答案:

答案 0 :(得分:3)

这取决于您希望匹配过程的综合程度。你可以尝试使用像

这样简单的东西
str.replaceAll("http://[^\\s]+", "")

e.g。

System.out.println("Today,wheather is cold.But I want to out. "
        + "http://weathers.com..... And I will take a cup of tea..."
        .replaceAll("http://[^\\s]+", ""));
Today,wheather is cold.But I want to out.  And I will take a cup of tea...

如果您想要更健壮的内容来匹配有效的网址,请使用更全面的网址正则表达式:

/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/

如需更全面的匹配,请参阅this answer。

答案 1 :(得分:1)

试试下面的正则表达式

((http|ftp|https):\/\/)?[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?

用于匹配您的有效URL,以下代码应该执行您想要的操作:

    String str = "Today,wheather is cold. But I want to out. http://weathers.com..... And I will take a cup of tea";
    String regularExpression = "(((http|ftp|https):\\/\\/)?[\\w\\-_]+(\\.[\\w\\-_]+)+([\\w\\-\\.,@?^=%&:/~\\+#]*[\\w\\-\\@?^=%&/~\\+#])?)";
    str = str.replaceAll(regularExpression,"");
    System.out.println(str);

修改

然而,这个正则表达式不适用于所有类型的URL,因为它太复杂,很难找到完美的正则表达式来匹配所有类型的URL。