如何截断一定长度的字符串,但在截断后包含完整的单词

时间:2017-02-10 02:38:09

标签: java arrays string

我想从字符串中截断最多60个字符的子字符串,但也希望在子字符串中获得完整的单词。这是我正在尝试的。

String originalText =" Bangladesh's first day of Test cricket on Indian soil has not been a good one. They end the day having conceded 71 runs in the last 10 overs, which meant they are already staring at a total of 356. M Vijay was solid and languid as he made his ninth Test century and third of the season. ";
String afterOptimized=originalText.substring(0, 60);
System.out.println("This is text . "+afterOptimized);

这是输出

This is text .  Bangladesh's first day of Test cricket on Indian soil has n

然而,我的要求是不要切断两者之间的单词。我怎么知道60个字符后是否有完整的单词。

5 个答案:

答案 0 :(得分:4)

您可以使用正则表达式,最多60个字符并以字边界结束:

Pattern pattern = Pattern.compile("(.{1,60})(\\b|$)(.*)");
Matcher m = pattern.match(originalText);
If (m.matches())
    afterOptimized = m.group(1);

或者,循环:

Pattern pattern = Pattern.compile("\\s*(.{1,60})(\\b|$)");
Matcher m = pattern.matcher(originalText);
int last = 0;
while (m.find()) {
    System.out.println(m.group(1));
    last = m.end();
}
if (last != originalText.length())
    System.out.println(originalText.substring(last));

如果你想只用白色空格而不是单词边界(可能在逗号,点等之前换行),你可能想要用\b替换\s

答案 1 :(得分:1)

如果原始字符串在第60位(第61个字符)中有一个字符意味着你要剪切一个单词,或者一个单词正在开始,则从第59位(第60个字符)搜索并包括该位置并停止当你找到一个空间。然后我们可以在该位置对字符串进行子串。如果字符串不长于60个字符,我们只是按原样返回。

public void truncateTest() {
    System.out.println(truncateTo("Bangladesh's first day of Test cricket on Indian soil has not been a good one. They end the day having conceded 71 runs in the last 10 overs, which meant they are already staring at a total of 356. M Vijay was solid and languid as he made his ninth Test century and third of the season. ", 60));
    System.out.println(truncateTo("Bangladesh's first day.", 60));
    System.out.println(truncateTo("They end the day having conceded 71 runs in the last 10 overs, which meant they are already staring at a total of 356. M Vijay was solid and languid as he made his ninth Test century and third of the season.", 60));
}

public String truncateTo(String originalText, int len) {
    if (originalText.length() > len) {
        if (originalText.charAt(len) != ' ') {
            for (int x=len-1;x>=0;x--) {
                if (Character.isWhitespace(originalText.charAt(x))) {
                    return originalText.substring(0, x);
                }
            }
        }
        // default if none of the conditions are met
        return originalText.substring(0, len);
    }
    return originalText;
}

结果...

Bangladesh's first day of Test cricket on Indian soil has
Bangladesh's first day.
They end the day having conceded 71 runs in the last 10

我认为我的+1 / -1索引逻辑正确:)

为了总结印度的击球,Pujara是耐心的缩影,Vijay的镜头蔑视,队长Kohli用完全不屑的表现限制了它,结果证明了印度队。

答案 2 :(得分:0)

this.state = { stopAnimation: true }

为什么人们会过度复杂的答案或答案甚至无法编译?

答案 3 :(得分:0)

String originalText =“孟加拉国在印度土地上测试板球的第一天并不是很好。他们在过去的10场比赛中失去了71分,这意味着他们已经盯着总共356米。 Vijay在第九个测试世纪和本赛季的第三个赛季表现得非常稳定和慵懒。“;

//trim the string to 60 characters

String  trimmedString = originalText.substring(0, 60);

//re-trim if we are in the middle of a word and to get full word instead of brolken one

String result=trimmedString.substring(0, Math.min(trimmedString.length(), trimmedString.lastIndexOf(" ")));

System.out.println(result);

答案 4 :(得分:-1)

假设你的文本在两个单词之间有空格,只需剪切文本并检查char +结束之后是否结束char +结束char之后确定我们需要剪切的内容:

if (char[i] != ' ') {
    if(i+1 == length || (i+1 < length && char[i+1] == ' '))
         return mString; // [I'm loser] bla ==> [I'm loser]
    if(i-1 > -1 && char[i-1] == ' ')
         return subHeadString(mString, 2); // return mString which has length = length - 2, ex: [I'm loser b]la ==> [I'm loser]
    return findBackStringWithSpace(mString, i); // coming back until has space char and return that sub string 
// [I'm loser bl]a ==> [I'm loser] 
} else {
    return mString;
}