Question

如何删除文本示例中的URL

String str="Fear psychosis after #AssamRiots - http://www.google.com/LdEbWTgD http://www.yahoo.com/mksVZKBz";

使用正则表达式？

我想删除文本中的所有网址。但它没有用，我的代码是：

String pattern = "(http(.*?)\\s)";
Pattern pt = Pattern.compile(pattern);
Matcher namemacher = pt.matcher(input);
if (namemacher.find()) {
  str=input.replace(namemacher.group(0), "");
}

Answer 1

输入包含网址的String

private String removeUrl(String commentstr)
    {
        String urlPattern = "((https?|ftp|gopher|telnet|file|Unsure|http):((//)|(\\\\))+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*)";
        Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher(commentstr);
        int i = 0;
        while (m.find()) {
            commentstr = commentstr.replaceAll(m.group(i),"").trim();
            i++;
        }
        return commentstr;
    }

Answer 2

好吧，您没有提供有关您的文字的任何信息，因此假设您的文字如下所示："Some text here http://www.example.com some text there"，您可以这样做：

String yourText = "blah-blah";
String cleartext = yourText.replaceAll("http.*?\\s", " ");

这将删除以“http”开头的所有序列，直到第一个空格字符。

您应该阅读String课程中的Javadoc。它会让你清楚。

Answer 3

如何定义网址？您可能不仅希望过滤http：//而且还要过滤https：//以及其他协议，例如ftp：//，rss：//或自定义协议。

也许这个正则表达式可以完成这项任务：

[\S]+://[\S]+

说明：

一个或多个非空格
后跟字符串“：//”
后跟一个或多个非空格

Answer 4

请注意，如果您的网址包含＆amp;等字符然后上面的答案将无法正常工作，因为replaceAll无法处理这些字符。对我有用的是删除新字符串变量中的那些字符，然后从m.find（）的结果中删除这些字符，并在我的新字符串变量上使用replaceAll。

private String removeUrl(String commentstr)
{
    // rid of ? and & in urls since replaceAll can't deal with them
    String commentstr1 = commentstr.replaceAll("\\?", "").replaceAll("\\&", "");

    String urlPattern = "((https?|ftp|gopher|telnet|file|Unsure|http):((//)|(\\\\))+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*)";
    Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(commentstr);
    int i = 0;
    while (m.find()) {
        commentstr = commentstr1.replaceAll(m.group(i).replaceAll("\\?", "").replaceAll("\\&", ""),"").trim();
        i++;
    }
    return commentstr;
}

Answer 5

m.group(0)应该替换为空字符串，而不是m.group(i)，其中i会在每次调用m.find()时递增private String removeUrl(String commentstr) { String urlPattern = "((https?|ftp|gopher|telnet|file|Unsure|http):((//)|(\\\\))+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*)"; Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE); Matcher m = p.matcher(commentstr); StringBuffer sb = new StringBuffer(commentstr.length); while (m.find()) { m.appendReplacement(sb, ""); } return sb.toString(); }，如上面的一个答案中所述。

MailMessage mail = new MailMessage("foo@sandboxce6d7987d87741098c67b8437378847d.mailgun.org", emails);
mail.Subject = "Hello2";
mail.Body = "Testing some Mailgun awesomness";           
mail.Headers.Add("recipient_variables", jsonobjects);

Answer 6

如果您可以继续使用python，那么您可以使用这些代码找到更好的解决方案，

import re
text = "<hello how are you ?> then ftp and mailto and gopher and file ftp://ideone.com/K3Cut rthen you "
text = re.sub(r"ftp\S+", "", result)
print(result)

Answer 7

正如@ Ev0oD所提到的，除了在我正在处理的以下推文中，该代码可以完美地工作： RT @_Val83_: The cast of #ThorRagnarok playing "Ragnarok Paper Scissors" #TomHiddleston #MarkRuffalo (https://t.co /k9nYBu3QHu)

要删除令牌的位置： commentstr = commentstr.replaceAll(m.group(i),"").trim();

我遇到了以下错误：

java.util.regex.PatternSyntaxException: Unmatched closing ')' near index 22

m.group(i)是https://t.co /k9nYBu3QHu）``

使用java从文本中删除url

7 个答案: