如何在Java中将嵌套重复组与正则表达式匹配?

时间:2015-02-07 20:33:02

标签: java regex regex-group

我正在尝试将重复组与Java匹配:

String s = "The very first line\n"
        + "\n"
        + "AA (aa)\n"
        + "BB (bb)\n"
        + "CC (cc)\n"
        + "\n";

Pattern p = Pattern.compile(
        "The very first line\\s+"
        + "((?<gr1>[a-z]+)\\s+\\((?<gr2>[^)]+)\\)\\s*)+",
        Pattern.DOTALL | Pattern.CASE_INSENSITIVE);

Matcher m = p.matcher(s);

if (m.find()) {
    for (int i = 0; i <= m.groupCount(); i++) {
        System.out.println("group #" + i + ": [" + m.group(i).trim() + "]");
    }
    System.out.println("group gr1: [" + m.group("gr1").trim() + "]");
    System.out.println("group gr2: [" + m.group("gr2").trim() + "]");
}

问题在于重复组:虽然正则表达式匹配整个文本块(请参阅下面输出示例中的group #0),但在检索组#2#3时(或按名称)同样 - gr1 / gr2)它只返回最后一个匹配(CC/cc)并跳过之前的匹配(AA/aaBB/bb

group #0: [The very first line

AA (aa)
BB (bb)
CC (cc)]
group #1: [CC (cc)]
group #2: [CC]
group #3: [cc]
group gr1: [CC]
group gr2: [cc]

有没有办法解决这个问题?

编辑 The very first line在模式中作为标识字符串 - 请参阅gknicker在下面的答案的评论

1 个答案:

答案 0 :(得分:1)

好像你希望你的模式匹配而不是整个输入字符串,而只是单个重复部分。如果这是真的,那么你的模式将是:

    Pattern p = Pattern.compile(
        "((?<gr1>[a-z]+)\\s+\\((?<gr2>[^)]+)\\))",
        Pattern.CASE_INSENSITIVE);

然后在这种情况下,您将有一个while循环来查找每个匹配项:

    Matcher m = p.matcher(s);

    while (m.find()) {
        System.out.println("group gr1: ["
            + m.group("gr1").trim() + "]");
        System.out.println("group gr2: ["
            + m.group("gr2").trim() + "]");
    }

但如果你需要整场比赛,你可能不得不使用两种模式:

    String s = "The very first line\n"
        + "\n"
        + "AA (aa)\n"
        + "BB (bb)\n"
        + "CC (cc)\n"
        + "\n";

    Pattern p = Pattern.compile(
        "The very first line\\s+(([a-z]+)\\s+\\(([^)]+)\\)\\s*)+",
        Pattern.CASE_INSENSITIVE);

    Pattern p2 = Pattern.compile(
        "((?<gr1>[a-z]+)\\s+\\((?<gr2>[^)]+)\\))",
        Pattern.CASE_INSENSITIVE);

    Matcher m = p.matcher(s);
    while (m.find()) {
        Matcher m2 = p2.matcher(m.group());
        while (m2.find()) {
            System.out.println("group gr1: ["
                + m2.group("gr1").trim() + "]");
            System.out.println("group gr2: ["
                + m2.group("gr2").trim() + "]");
        }
    }