根据长度分割并添加字符串

时间:2019-08-16 12:59:58

标签: java string list stringbuilder

我有一个段落作为输入字符串。我正在尝试将段落拆分为句子数组,其中每个元素包含的确切句子不超过250个字符。

我尝试根据分隔符(如)分割字符串。将所有字符串转换为列表。使用StringBuilder,我尝试根据长度(250个字符)追加字符串。

    List<String> list = new ArrayList<String>();

    String text = "Perhaps far exposed age effects. Now distrusts you her delivered applauded affection out sincerity. As tolerably recommend shameless unfeeling he objection consisted. She although cheerful perceive screened throwing met not eat distance. Viewing hastily or written dearest elderly up weather it as. So direction so sweetness or extremity at daughters. Provided put unpacked now but bringing. Unpleasant astonished an diminution up partiality. Noisy an their of meant. Death means up civil do an offer wound of. Called square an in afraid direct. Resolution diminution conviction so mr at unpleasing simplicity no. No it as breakfast up conveying earnestly immediate principle. Him son disposed produced humoured overcame she bachelor improved. Studied however out wishing but inhabit fortune windows. ";

    Pattern re = Pattern.compile("[^.!?\\s][^.!?]*(?:[.!?](?!['\"]?\\s|$)[^.!?]*)*[.!?]?['\"]?(?=\\s|$)",
            Pattern.MULTILINE | Pattern.COMMENTS);

    Matcher reMatcher = re.matcher(text);
    while (reMatcher.find()) {
        list.add(reMatcher.group());
    }
    String textDelimted[] = new String[list.size()];
    textDelimted = list.toArray(textDelimted);

    StringBuilder stringB = new StringBuilder(100);

    for (int i = 0; i < textDelimted.length; i++) {
        while (stringB.length() + textDelimted[i].length() < 250)
            stringB.append(textDelimted[i]);

        System.out.println("!#@#$%" +stringB.toString());
    }
}

预期结果:

[0]:可能是年龄影响很远的影响。现在不信任您,她以诚挚的态度表示赞赏。作为可以容忍的建议,他提出了反对的无耻之情。她虽然开朗地感觉到了被屏蔽的投掷碰到了不吃饭的距离。

[1]:匆忙观看或书面记录最亲爱的老人。所以要对女儿说些甜蜜或极端。提供,现在拆包但带。令人不快的是减少了偏见。嘈杂的意思。

[2]:死亡意味着平民受伤。叫方安害怕直接。降低分辨率的信念使先生不那么简单。不,它是早餐,它传达了认真的直接原则。

[3]他儿子的性格幽默使她克服了学士学位的提高。研究了但希望但是却居住在财富窗口中。

2 个答案:

答案 0 :(得分:0)

您的问题尚不清楚,请尝试重新措辞以使您的问题确切可见。

话虽这么说,我假设“我尝试根据分隔符(如。)分割字符串。将所有字符串转换为列表”意味着您想在{时分割String。出现,并转换为List<String>。可以按照以下步骤完成:

String input = "hello.world.with.delimiters";
String[] words = input.split("\\.");  // String[] with contents {"hello", "world", "with", "delimiters"}
List<String> list = Arrays.asList(words);  // Identical contents, just in a List<String>


// if you want to append to a StringBuilder based on length
StringBuilder sb = new StringBuilder();
for (String s : list) {
    if (someLengthCondition(s.length())) sb.append(list);
}

当然,您对someLengthCondition()的实现将取决于您的需求。我无法提供一个,因为很难理解您要做什么。

答案 1 :(得分:0)

我认为您只需要稍微修改一下循环即可。 我的结果匹配。

import java.util.List;
import java.util.ArrayList;
import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class MyClass {
    public static void main(String args[]) {

        List<String> list = new ArrayList<String>();

        String text = "Perhaps far exposed age effects. Now distrusts you her delivered applauded affection out sincerity. As tolerably recommend shameless unfeeling he objection consisted. She although cheerful perceive screened throwing met not eat distance. Viewing hastily or written dearest elderly up weather it as. So direction so sweetness or extremity at daughters. Provided put unpacked now but bringing. Unpleasant astonished an diminution up partiality. Noisy an their of meant. Death means up civil do an offer wound of. Called square an in afraid direct. Resolution diminution conviction so mr at unpleasing simplicity no. No it as breakfast up conveying earnestly immediate principle. Him son disposed produced humoured overcame she bachelor improved. Studied however out wishing but inhabit fortune windows. ";

        Pattern re = Pattern.compile("[^.!?\\s][^.!?]*(?:[.!?](?!['\"]?\\s|$)[^.!?]*)*[.!?]?['\"]?(?=\\s|$)",
                Pattern.MULTILINE | Pattern.COMMENTS);

        Matcher reMatcher = re.matcher(text);
        while (reMatcher.find()) {
            list.add(reMatcher.group());
        }
        String textDelimted[] = new String[list.size()];
        textDelimted = list.toArray(textDelimted);

        StringBuilder stringB = new StringBuilder(300);

        for (int i = 0; i < textDelimted.length; i++) {
            if(stringB.length() + textDelimted[i].length() < 250) {
                stringB.append(textDelimted[i]);
            } else {
                System.out.println("!#@#$%" +stringB.toString());
                stringB = new StringBuilder(300);
                stringB.append(textDelimted[i]);
            }

        }
        System.out.println("!#@#$%" +stringB.toString());
    }
}

用以下代码替换println以获得结果列表:

ArrayList<String> arrlist = new ArrayList<String>(5);
..
arrlist.add(stringB.toString());
..