如何将字符串拆分为字符串句子?

时间:2014-08-02 17:34:13

标签: java string sentence

对于我的一个项目,我需要将段落分成句子。我已经发现你可以使用下面的代码将段落分成不同的句子然后打印出来:

BreakIterator iterator = BreakIterator.getSentenceInstance(Locale.US);
iterator.setText(content);
int start = iterator.first();
for (int end = iterator.next();
    end != BreakIterator.DONE;
    start = end, end = iterator.next()) {
System.out.println(content.substring(start,end));

变量'内容'是预定义的变量。

但是,我希望将细分的句子作为字符串,以便我可以继续使用它们。

我该怎么做?我认为它可能与字符串数组有关。 谢谢你的帮助。

2 个答案:

答案 0 :(得分:0)

我从未使用BreakIterator,我认为您希望将其用于区域设置(仅供参考:herehere)。无论哪种方式,您都可以将句子保留在数组或List中,正如您所提到的那样。

BreakIterator iterator = BreakIterator.getSentenceInstance(Locale.US);
iterator.setText(content);
int start = iterator.first();

List<String> sentences = new ArrayList<String>();
for (int end = iterator.next(); end != BreakIterator.DONE; start = end, end = iterator.next()) {
    //System.out.println(content.substring(start,end));
    sentences.add(content.substring(start,end));
}

答案 1 :(得分:0)

尝试使用此link

public static void main(String[] args) {
    String content =
            "Line boundary analysis determines where a text " +
            "string can be broken when line-wrapping. The " +
            "mechanism correctly handles punctuation and " +
            "hyphenated words. Actual line breaking needs to " +
            "also consider the available line width and is " +
            "handled by higher-level software. ";

    BreakIterator iterator =
            BreakIterator.getSentenceInstance(Locale.US);

    Arraylist<String> sentences = count(iterator, content);

}

private static Arraylist<String> count(BreakIterator bi, String source) {
    int counter = 0;
    bi.setText(source);

    int lastIndex = bi.first();
    Arraylist<String> contents = new ArrayList<>(); 
    while (lastIndex != BreakIterator.DONE) {
        int firstIndex = lastIndex;
        lastIndex = bi.next();

        if (lastIndex != BreakIterator.DONE) {
            String sentence = source.substring(firstIndex, lastIndex);
            System.out.println("sentence = " + sentence);
            contents.add(sentence);
            counter++;
        }
    }
    return contents;
}