我想从字符串中删除所有形式的网址,如果它们以.*://
或www.*
开头,但在将正则表达式添加到预先存在的复杂模式时遇到问题。
目前,我使用
public static String censorUrls(String str) {
String urlPattern = "((https?|ftp|gopher|telnet|file|Unsure|http):((//)|(\\\\))+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*)";
Pattern pattern = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
int i = 0;
while ( matcher.find() ) {
str = str.replaceAll(matcher.group(i), "****").trim();
i++;
}
return str;
}
但是,对于可能只是www.google.com
或google.com
甚至www3.site.com
答案 0 :(得分:1)
无法记住它的来源,但你可以尝试一下。
如果它的中间字符串匹配与否,您可能可以使用此字符串
这使用空白边界。 (?<!\S)
和(?!\S)
并将在锚点匹配
职位也是如此。
原始:(?i)(?<!\S)(?!mailto:)(?:[a-z]*://)?(?:\S+(?::\S*)?@)?(?:(?:(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{a1}-\x{ffff}0-9]+-?)*[a-z\x{a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{a1}-\x{ffff}0-9]+-?)*[a-z\x{a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{a1}-\x{ffff}]{2,})))|localhost)(?::\d{2,5})?(?:\/[^\s]*)?(?!\S)
弦乐:"(?i)(?<!\\S)(?!mailto:)(?:[a-z]*://)?(?:\\S+(?::\\S*)?@)?(?:(?:(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|(?:(?:[a-z\\x{a1}-\\x{ffff}0-9]+-?)*[a-z\\x{a1}-\\x{ffff}0-9]+)(?:\\.(?:[a-z\\x{a1}-\\x{ffff}0-9]+-?)*[a-z\\x{a1}-\\x{ffff}0-9]+)*(?:\\.(?:[a-z\\x{a1}-\\x{ffff}]{2,})))|localhost)(?::\\d{2,5})?(?:\\/[^\\s]*)?(?!\\S)"
格式化:
(?i)
(?<! \S )
(?! mailto: )
(?:
[a-z]* :
\/\/
)?
(?:
\S+
(?: : \S* )?
@
)?
(?:
(?:
(?:
[1-9] \d?
| 1 \d\d
| 2 [01] \d
| 22 [0-3]
)
(?:
\.
(?: 1? \d{1,2} | 2 [0-4] \d | 25 [0-5] )
){2}
(?:
\.
(?:
[1-9] \d?
| 1 \d\d
| 2 [0-4] \d
| 25 [0-4]
)
)
| (?:
(?: [a-z\x{a1}-\x{ffff}0-9]+ -? )*
[a-z\x{a1}-\x{ffff}0-9]+
)
(?:
\.
(?: [a-z\x{a1}-\x{ffff}0-9]+ -? )*
[a-z\x{a1}-\x{ffff}0-9]+
)*
(?:
\.
(?: [a-z\x{a1}-\x{ffff}]{2,} )
)
)
| localhost
)
(?: : \d{2,5} )?
(?: \/ [^\s]* )?
(?! \S )