如何在正则表达式中使用捕获组作为表达式的开头?

时间:2018-05-03 21:15:56

标签: java regex

手头的任务:我正在努力准备打印自动生成的ID的特定列表。它们的格式为aa-bb-cc-dd-ee-ff-gg ...每个元组都可以通过[a-zA-Z0-9] +(不确定长度)选择,分隔符是[ - ](最大一个)。

每个id中有1到9个元组。如果id是3元组或更少,我将返回一组。如果id超过3个元组(4+),那么我将返回两个组,第一个组由3个元组组成,第二个组由其余组组成。

一次只能处理一个字符串。这是测试集:

 one1
 one1-two2
 one1-two2-three3
 one1-two2-three3-4a
 one1-two2-three3-4a-5a
 one1-two2-three3-4a-5a-6a
 one1-two2-three3-4a-5a-6a-7a

具体地说,这意味着:

 one1 -> {"one1"}
 one1-two2 -> {"one1-two2"}
 one1-two2-three3 -> {"one1-two2-three3"}
 one1-two2-three3-4a -> {"one1-two2-three3", "4a"}
 one1-two2-three3-4a-5a -> {"one1-two2-three3", "4a-5a"}
 one1-two2-three3-4a-5a-6a -> {"one1-two2-three3", "4a-5a-6a"}
 one1-two2-three3-4a-5a-6a-7a -> {"one1-two2-three3", "4a-5a-6a-7a"}

到目前为止完成的工作(这总是正确选择第一组)

(^[a-zA-Z0-9]+$)|(^[a-zA-Z0-9]+[-][a-zA-Z0-9]+$)|(^[a-zA-Z0-9]+[-][a-zA-Z0-9]+[-][a-zA-Z0-9]+$)|(^[a-zA-Z0-9]+[-][a-zA-Z0-9]+[-][a-zA-Z0-9]+)

我想要实现的目标:从捕获组的末尾开始,检查它是否不是该行的结尾,在该点之后的第一个' - '字符后开始读取,匹配到该行的结尾。

其他信息:我正在使用Java的本机正则表达式引擎。

2 个答案:

答案 0 :(得分:1)

您不需要过度复杂化以解决问题:

(?m)^(\w+(?:-\w+){0,2})(?:-(\w+(?:-\w+)*))?$

(?m)启用多行标记,使^$锚点分别匹配每行的开头和结尾。匹配首先匹配单词字符\w+,然后再添加两个-\w+模式,以构建第一个捕获组信息。

第二个捕获组包含随后发生的任何事情。如果您确定格式化,也可以这样做:

(?m)^(\w+(?:-\w+){0,2})(.+)?$

live demo

上测试

答案 1 :(得分:0)

以下正则表达式仅匹配有效字符串,并返回2个捕获组:

([a-zA-Z0-9]+(?:-[a-zA-Z0-9]+){0,2})(?:-([a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)*))?

说明

(                            Start capture of group 1:
  [a-zA-Z0-9]+                 Match first tuple of group 1
  (?:-[a-zA-Z0-9]+){0,2}       Match 0-2 delimiter+tuple pairs for a total of 1-3 tuples
)                            End capture of group 1
(?:                          Optional:
  -                            Match delimiter initiating group 2
  (                            Start capture of group 2:
    [a-zA-Z0-9]+                 Match first tuple of group 2
    (?:-[a-zA-Z0-9]+)*           Match 0+ delimiter+tuple pairs for a total of 1+ tuples
  )                            End capture of group 2
)?                           End optional

演示

public static void main(String... args) {
    test("one1",
         "one1-two2",
         "one1-two2-three3",
         "one1-two2-three3-4a",
         "one1-two2-three3-4a-5a",
         "one1-two2-three3-4a-5a-6a",
         "one1-two2-three3-4a-5a-6a-7a",
         "one1-two2-three3-4a-5a-6a-7a-8a",
         "one1_two2"); // fail: invalid character
}

private static void test(String... values) {
    Pattern p = Pattern.compile("([a-zA-Z0-9]+(?:-[a-zA-Z0-9]+){0,2})(?:-([a-zA-Z0-9]+(?:-[a-zA-Z0-9]+){0,3}))?");
    for (String value : values) {
        Matcher m = p.matcher(value);
        if (! m.matches())
            System.out.printf("%s -> NO MATCH%n", value);
        else if (m.start(2) == -1) // capture group 2 not found
            System.out.printf("%s -> {\"%s\"}%n", value, m.group(1));
        else
            System.out.printf("%s -> {\"%s\", \"%s\"}%n", value, m.group(1), m.group(2));
    }
}

输出

one1 -> {"one1"}
one1-two2 -> {"one1-two2"}
one1-two2-three3 -> {"one1-two2-three3"}
one1-two2-three3-4a -> {"one1-two2-three3", "4a"}
one1-two2-three3-4a-5a -> {"one1-two2-three3", "4a-5a"}
one1-two2-three3-4a-5a-6a -> {"one1-two2-three3", "4a-5a-6a"}
one1-two2-three3-4a-5a-6a-7a -> {"one1-two2-three3", "4a-5a-6a-7a"}
one1-two2-three3-4a-5a-6a-7a-8a -> {"one1-two2-three3", "4a-5a-6a-7a-8a"}
one1_two2 -> NO MATCH