Java - 将字符串拆分为具有字符限制的句子

时间:2012-02-20 15:03:43

标签: java android regex string split

我想将文本拆分为句子(由或BreakIterator分割)。 但是:每个句子的长度不得超过100个。

示例:

Lorem ipsum dolor sit. Amet consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et dolore
magna aliquyam erat, sed diam voluptua. At vero eos et accusam
et justo duo dolores.

To :( 3个元素,不会破坏一个单词,而是一个句子)

" Lorem ipsum dolor sit. ",
" Amet consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt
  ut labore et dolore magna",
" aliquyam erat, sed diam voluptua. At vero eos et accusam
  et justo duo dolores. "

我该如何正确地做到这一点?

4 个答案:

答案 0 :(得分:3)

可能有更好的方法,但在这里:

public static void main(String... args) {

    String originalString = "Lorem ipsum dolor sit. Amet consetetur sadipscing elitr,sed diam nonumy eirmod tempor invidunt ut labore "
            + "et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores.";


    String[] s1 = originalString.split("\\.");
    List<String> list = new ArrayList<String>();

    for (String s : s1)
        if (s.length() > 100)
            list.addAll(Arrays.asList(s.split("(?<=\\G.{100})")));
        else
            list.add(s);

    System.out.println(list);
}

“大小分割字符串”正则表达式来自this SO question。你可能可以整合两个正则表达式,但我不确定这是一个明智的想法(:

如果正则表达式未在Andrond中运行(\G运算符无法在任何地方识别),请尝试other solutions linked根据字符串的大小拆分字符串。

答案 1 :(得分:2)

在这种情况下,Regex不会帮助你。

我会使用空格或.拆分文本,然后开始连接。像这样:

伪代码

words = text.split("[\s\.]");
lines = new List();
while ( words.length() > 0 ) {

  String line = new String();
  while ( line.length() + words.get(0).length() < 100 ) {
    line += words.get(0);
    words.remove(words.get(0));
  }

  lines.add(line);

}

答案 2 :(得分:2)

解决了(谢谢Macarse的灵感):

String[] words = text.split("(?=[\\s\\.])");
ArrayList<String> array = new ArrayList<String>();
int i = 0;
while (words.length > i) {
    String line = "";
    while ( words.length > i && line.length() + words[i].length() < 100 ) {
        line += words[i];
        i++;
    }
    array.add(line);
}

答案 3 :(得分:0)

按照之前的解决方案,我很快陷入了一个无限循环的问题,当每个单词可能超过限制时(非常不可能,但不幸的是我有一个非常受限的环境)。所以,我为这个边缘情况添加了一个修复(有点)(我认为)。

import java.util.*;

public class Main
{
    public static void main(String[] args) {
        sentenceToLines("In which of the following, a person is constantly followed/chased by another person or group of several people?", 15);
    }

    private static ArrayList<String> sentenceToLines(String s, int limit) {
        String[] words = s.split("(?=[\\s\\.])");
        ArrayList<String> wordList =  new ArrayList<String>(Arrays.asList(words));
        ArrayList<String> array = new ArrayList<String>();
        int i = 0, temp;
        String word, line;
        while (i < wordList.size()) {
            line = "";
            temp = i;
            // split the long words to the size of the limit
            while(wordList.get(i).length() > limit) {
                word = wordList.get(i);
                wordList.add(i++, word.substring(0, limit));
                wordList.add(i, word.substring(limit));
                wordList.remove(i+1);
            }
            i = temp;
            // continue making lines with newly split words
            while ( i < wordList.size() && line.length() + wordList.get(i).length() <= limit ) {
                line += wordList.get(i);
                i++;
            }
            System.out.println(line.trim());
            array.add(line.trim());
        }
        return array;
    }
    
}