跳过第一次出现并在Java中拆分字符串

时间:2016-04-08 10:08:09

标签: java regex

如果没有超过4次,我想跳过第一次出现。现在我将得到最多5个数字下划线出现。我需要使用下面的代码生成输出A_B,C,D,E,F和我。我想要更好的解决方案请检查并让我知道。提前致谢。

String key = "A_B_C_D_E_F";
int occurance = StringUtils.countOccurrencesOf(key, "_");
System.out.println(occurance);
String[] keyValues = null;
if(occurance == 5){
    key = key.replaceFirst("_", "-");
    keyValues = StringUtils.tokenizeToStringArray(key, "_");
    keyValues[0] = replaceOnce(keyValues[0], "-", "_");
}else{
    keyValues = StringUtils.tokenizeToStringArray(key, "_");
}

for(String keyValue : keyValues){
    System.out.println(keyValue);
}

5 个答案:

答案 0 :(得分:2)

嗯,这是相对简单的":

String str = "A_B_C_D_E_F_G";
String[] result = str.split("(?<!^[^_]*)_|_(?=(?:[^_]*_){0,3}[^_]*$)");
System.out.println(Arrays.toString(result));

这里有一个带有注释的版本,可以更好地理解,也可以按原样使用:

String str = "A_B_C_D_E_F_G";
String[] result = str.split("(?x)                  # enable embedded comments \n"
                            + "                    # first alternative splits on all but the first underscore \n"
                            + "(?<!                # next character should not be preceded by \n"
                            + "    ^[^_]*          #     only non-underscores since beginning of input \n"
                            + ")                   # so this matches only if there was an underscore before \n"
                            + "_                   # underscore \n"
                            + "|                   # alternatively split if an underscore is followed by at most three more underscores to match the less than five underscores case \n"
                            + "_                   # underscore \n"
                            + "(?=                 # preceding character must be followed by \n"
                            + "    (?:[^_]*_){0,3} #     at most three groups of non-underscores and an underscore \n"
                            + "    [^_]*$          #     only more non-underscores until end of line \n"
                            + ")");
System.out.println(Arrays.toString(result));

答案 1 :(得分:1)

您可以使用此正则表达式进行拆分:

String s = "A_B_C_D_E_F";
String[] list = s.split("(?<=_[A-Z])_");

输出:

  

[A_B,C,D,E,F]

我们的想法是仅匹配前面带有_的{​​{1}},这有效地只跳过第一个"_[A-Z]"

如果您考虑的字符串在"_"之间采用不同的格式,则必须使用相应的正则表达式替换[A-Z]

答案 2 :(得分:0)

您可以根据\G使用此正则表达式,而不是拆分使用匹配:

String str = "A_B_C_D_E_F";
Pattern p = Pattern.compile("(^[^_]*_[^_]+|\\G[^_]+)(?:_|$)");
Matcher m = p.matcher(str);
List<String> resultArr = new ArrayList<>();
while (m.find()) {
    resultArr.add( m.group(1) );
}
System.err.println(resultArr);

\G在上一场比赛结束或第一场比赛的字符串开头处断言位置。

<强>输出:

[A_B, C, D, E, F]

RegEx Demo

答案 3 :(得分:0)

分手后我会这样做。

public void test() {
    String key = "A_B_C_D_E_F";
    String[] parts = key.split("_");
    if (parts.length >= 5) {
        String[] newParts = new String[parts.length - 1];
        newParts[0] = parts[0] + "-" + parts[1];
        System.arraycopy(parts, 2, newParts, 1, parts.length - 2);
        parts = newParts;
    }
    System.out.println("parts = " + Arrays.toString(parts));
}

答案 4 :(得分:0)

虽然Java没有正式说明,但您可以在lookbehind中使用*+,因为它们被实现为限制量词:*{0,0x7FFFFFFF}和{{ 1}}为+(请参阅Regex look-behind without obvious maximum length in Java)。所以,如果你的字符串不是太长,你可以使用

{1,0x7FFFFFFF}

请参阅JAVA demo

免责声明:由于这是对当前Java 8正则表达式引擎的一种利用,因此将来在Java中修复错误时代码可能会中断。