如何将字符串拆分为固定长度的块,但不是在单词之间?

时间:2014-06-30 14:36:07

标签: java split guava

我想将一些长句分成固定长度的块。到目前为止,我使用番石榴: Splitter.fixedLength(20).split(string);

很好,但我怎样才能防止单词之间的分裂?我的目标是分割为最多20个字符,但如果分割点不是空格则更少。

3 个答案:

答案 0 :(得分:3)

我发现org.apache.commons.lang3.text.WordUtils.wrap()正是我要求的。

答案 1 :(得分:2)

我会在白色空间分裂,然后组合可以组合的单词。

String[] arr = str.split("\\s+");         //get arr of strings by whitespace
List<String> split = new ArrayList<>();   //final list of tokens
for(int i =0; i<arr.length-1; i++){       //for all but the last word
    String s = arr[i];
    int len = s.length();             
    String newString = s;
    while(len < 20){                      //keep adding to the word until there are 20 chars
       if(len+arr[i+1].length()<19){      //if 2 words + space <20...
          newString+=" "+arr[i+1];        //add the two words plus a space
          len = newString.length();       //sets the value of len to the current string length
          i++;                            //skip that word, its been added!
       }
    }
    split.add(newString);                 //add either original word, or combined word.
}
return split;

答案 2 :(得分:2)

也许

Matcher m = Pattern.compile("(?s)(.{1,19}(\\s|$)|\\S{20}|\\S+$)").matcher(s);
while (m.find()) {
    String part = m.group(1);
    ...
}

正则表达式:

(
    .{1,19}(\\s|$)      upto 19 chars with space at end or end-of-string
                        could use word boundary \\b
|
    \\S{20}             20 non-chars
|
     \\S+$              at the end
)