我们正在将一个段落分成基于点的句子。
String[] sentences = message.split("(?<=[.!?])\\s*");
以下句子
HP E2B16UT Mini-tower Workstation - 1 x Intel Xeon E3-1245V3 3.40 GHz
分为
HP E2B16UT Mini-tower Workstation - 1 x Intel Xeon E3-1245V3 3
40 GHz
我应该如何避免在3.40 GHz之类的东西上分裂,因为我们知道它形成一个单词而不是分隔符
答案 0 :(得分:2)
你可以试试这个:
public static void main(String[] args) throws IOException
{
String message = "HP E2B16UT Mini-tower Workstation - 1 x Intel Xeon E3-1245V3 3.40 GHz. Hello, you are welcome. StackOverflow. some_email@hotmail.com";
String[] sentences = message.split("(?<=[.!?])\\s* ");
for (String s : sentences) {
System.out.println(s);
}
}
<强>输出:强>
HP E2B16UT Mini-tower Workstation - 1 x Intel Xeon E3-1245V3 3.40 GHz.
Hello World.
StackOverflow.
some_email@hotmail.com
答案 1 :(得分:0)
String message= "This is an example. This string is for split on '.'."//add a space after . for new sentence
替换
String[] sentences = message.split("(?<=[.!?])\\s*");
通过
String[] sentences = message.split("(?<=[.!?])\\s* ");//add a space to split on new sentence
答案 2 :(得分:0)
尝试这对我来说很容易理解
String str = "This is how I tried to split a paragraph into a sentence. But, there is a problem. My paragraph includes dates like Jan 13, 2014 , words like U.S and numbers like 2.2. They all got splitted by the above code.";
String[] sentenceHolder = str.split("[.?!][^A-Z0-9]");
for (int i = 0; i < sentenceHolder.length; i++) {
System.out.println(sentenceHolder[i]);
}