我正在寻找一种删除Java中包含URL的句子的方法。请注意,我要删除整个句子,而不仅仅是URL。
我找到了一种方法来做到这一点,并且它起作用了,但是我正在寻找一种更简单的方法,也许只用一个RegEx?
String source = "Sorry, we are closed today. Visit our website tomorrow at https://www.google.com. Thank you and have a nice day!";
iterator.setText(source);
int start = iterator.first();
int end = iterator.next();
while(end != BreakIterator.DONE){
if(SENT.matcher(source.substring(start,end)).find()) {
source = source.substring(0, start) + source.substring(end);
iterator.setText(source);
start = iterator.first();
}else{
start = end;
}
end = iterator.next();
}
System.out.println(source);
This prints : Sorry, we are closed today. Thank you and have a nice day!
答案 0 :(得分:0)
最好先中断/拆分句子,然后再通过表达式。
然后,该表达式可能只返回没有URL的那些行(句子)
^(?!.*https?[^\s]+.*).*$
在这里,我们将URL定义为https?[^\s]+
。
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "^(?!.*https?[^\\s]+.*).*$";
final String string = "Sorry, we are closed today. Visit our website tomorrow at https://www.google.com. Thank you and have a nice day!\n\n"
+ "Sorry, we are closed today. Visit our website tomorrow at. Thank you and have a nice day!\n\n"
+ "Sorry, we are closed today. Visit our website tomorrow at https://www.goog. Thank you and have a nice day!\n";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
jex.im可视化正则表达式:
答案 1 :(得分:0)
"(?<=^|[?!.])[^?!.]+" + urlRegex + ".*?(?:$|[?!.])"
根据您对句子的定义,这将匹配每个与urlRegex
匹配的整个句子;您可以使用replaceAll
摆脱它们。 (周围有很多URL正则表达式,您没有指定要使用的正则表达式,因此我将URL的确切定义留给了您。)