将段落分解为字符串标记

时间:2014-08-20 17:51:13

标签: java algorithm substring

我能够根据第n个字符限制将文本段落分解为子字符串。我遇到的冲突是我的算法正是这样做的,并且正在分解单词。这是我被困的地方。如果字符限制发生在单词的中间,我如何回溯到空格以便我的所有子串都有整个单词?

这是我正在使用的算法

int arrayLength = 0;
arrayLength = (int) Math.ceil(((mText.length() / (double) charLimit)));

String[] result = new String[arrayLength];
int j = 0;
int lastIndex = result.length - 1;
for (int i = 0; i < lastIndex; i++) {
    result[i] = mText.substring(j, j + charLimit);
    j += charLimit;
}

result[lastIndex] = mText.substring(j);

我使用任何第n个整数值设置charLimit变量。 mText是带有一段文本的字符串。关于如何改进这个的任何建议?提前谢谢。

我收到了很好的回应,只是因为你知道我做了什么来弄清楚我是否登陆过一个空间,我使用了这个循环。我只是不知道如何纠正这一点。

while (!strTemp.substring(strTemp.length() - 1).equalsIgnoreCase(" ")) {
    // somehow refine string before added to array
}

1 个答案:

答案 0 :(得分:3)

不确定我是否正确理解了你想要的东西,但却是我解释的答案:

您可以使用lastIndexOf找到字符数限制之前的最后一个空格,然后检查您是否足够接近您的限制(对于没有空格的文本),即:

int arrayLength = 0;
arrayLength = (int) Math.ceil(((mText.length() / (double) charLimit)));

String[] result = new String[arrayLength];
int j = 0;
int tolerance = 10;
int splitpoint;
int lastIndex = result.length - 1;
for (int i = 0; i < lastIndex; i++) {
    splitpoint = mText.lastIndexOf(' ' ,j+charLimit);
    splitpoint = splitpoint > j+charLimit-tolerance ? splitpoint:j+charLimit;
    result[i] = mText.substring(j, splitpoint).trim();
    j = splitpoint;
}

result[lastIndex] = mText.substring(j).trim();

这将搜索charLimit之前的最后一个空格(示例值),如果它不在tolerance之外,则将字符串拆分在那里;如果不是charLimit,则将其拆分为charLimit

此解决方案的唯一问题是最后一个Stringtoken可能会超过arrayLength,因此您可能需要调整while (mText - j > charLimit)并循环 public static void main(String[] args) { String mText = "I am able to break up paragraphs of text into substrings based upon nth given character limit. The conflict I have is that my algorithm is doing exactly this, and is breaking up words. This is where I am stuck. If the character limit occurs in the middle of a word, how can I back track to a space so that all my substrings have entire words?"; int charLimit = 40; int arrayLength = 0; arrayLength = (int) Math.ceil(((mText.length() / (double) charLimit))); String[] result = new String[arrayLength]; int j = 0; int tolerance = 10; int splitpoint; int lastIndex = result.length - 1; for (int i = 0; i < lastIndex; i++) { splitpoint = mText.lastIndexOf(' ' ,j+charLimit); splitpoint = splitpoint > j+charLimit-tolerance ? splitpoint:j+charLimit; result[i] = mText.substring(j, splitpoint); j = splitpoint; } result[lastIndex] = mText.substring(j); for (int i = 0; i<arrayLength; i++) { System.out.println(result[i]); } }


修改

运行示例代码:

I am able to break up paragraphs of text
 into substrings based upon nth given
 character limit. The conflict I have is
 that my algorithm is doing exactly
 this, and is breaking up words. This is
 where I am stuck. If the character
 limit occurs in the middle of a word,
 how can I back track to a space so that
 all my substrings have entire words?

输出:

{{1}}

其他编辑:根据curiosu的建议添加了trim()。它删除了字符串标记的空白区域。