使用BreakIterator Java将引号拆分为句子

时间:2015-02-17 03:42:19

标签: java

我尝试使用BreakIterator Java将包含引号的段落拆分为句子。

这是我的段落包含我要分割的引文:

  

“人们现在越来越聪明,越来越关键。他们知道哪些是   有资格选择,哪一个泛,黄金在哪里,“他说   Edi说,这是应对即将举行的选举的策略   还在等待这项规定。


这是我的代码:

public class SplitParagraph {
public static void main(String[] args){
    String paragraph = "\"People are now getting smarter and more critical. They know which are eligible to choose, which one pan, where the gold,\" he said. About strategies for coping with the upcoming elections, Edi said, it was still awaiting the provision.";
    BreakIterator iterator = BreakIterator.getSentenceInstance(Locale.ENGLISH);
    iterator.setText(paragraph);
    int start = iterator.first();
    int i=1;
    for (int end = iterator.next();end != BreakIterator.DONE; start = end, end = iterator.next()) {
        System.out.println("Sentence "+i+" : "+paragraph.substring(start,end));
        i++;
    }
}}


输出计划:

句子1:“人们现在越来越聪明,越来越关键 句子2:他们知道哪些人有资格选择,哪一个是黄金,“他说。 第3句:关于应对即将举行的选举的策略,Edi说,它仍在等待这项规定。

  

输出程序不正确,因为段落只包含2   句子。不是3句话。


正确的输出程序必须如下:

句子1:“人们现在越来越聪明,越来越挑剔。他们知道哪些人有资格选择,哪一个是黄金,”他说。
第二句:关于应对即将举行的选举的策略,Edi说,它仍在等待这项规定。

对我的问题有任何想法吗?

1 个答案:

答案 0 :(得分:1)

根据以下正则表达式分割您的输入,

"(?<=\\.)\\s+(?=(?:\"[^\"]*\"|[^\"])*$)"

匹配一个或多个空格,紧跟在双引号内不存在的点之后。

(?<=\\.) - 正面观察,只看到所有点。

\\s+ - 匹配一个或多个空格字符。

(?=...) - 确定必须遵循匹配的正向前瞻,

(?:\"[^\"]*\"|[^\"])* - 任何双引号块,如"foobar"或任何字符,但不是双引号,零次或多次。

(?:\"[^\"]*\"|[^\"])*$然后它必须到达终点。这不会在"foo. bar"字符串中匹配空格,因为在该空格之后存在单个双引号而不是双引号块。

DEMO

String s = "\"People are now getting smarter and more critical. They know which are eligible to choose, which one pan, where the gold,\" he said. About strategies for coping with the upcoming elections, Edi said, it was still awaiting the provision.";
String parts[] = s.split("(?<=\\.)\\s+(?=(?:\"[^\"]*\"|[^\"])*$)");
for(String i: parts)
{
System.out.println(i);
}

<强>输出:

"People are now getting smarter and more critical. They know which are eligible to choose, which one pan, where the gold," he said.
About strategies for coping with the upcoming elections, Edi said, it was still awaiting the provision.

String s = "\"People are now getting smarter and more critical. They know which are eligible to choose, which one pan, where the gold,\" he said. About Mr. Mrs. strategies for coping with the upcoming elections, Edi said, it was still awaiting the provision.";
String parts[] = s.split("(?<!Mrs?\\.)(?<=\\.)\\s+(?=(?:\"[^\"]*\"|[^\"])*$)");
for(String i: parts)
{
System.out.println(i);
}

输出:

"People are now getting smarter and more critical. They know which are eligible to choose, which one pan, where the gold," he said.
About Mr. Mrs. strategies for coping with the upcoming elections, Edi said, it was still awaiting the provision.