如何匹配多个正则表达式并确定在Java中匹配的表达式

时间:2012-04-12 14:38:29

标签: java regex

我有一个逐行浏览文件的Java程序,并尝试将每一行与四个正则表达式中的一个匹配。根据匹配的表达式,程序执行特定操作。这就是我所拥有的:

private void processFile(ArrayList<String> lines) {
    ArrayList<Component> Components = new ArrayList<>();
    Pattern pattern = Pattern.compile(
            "Object name\\.{7}: (.++)|"
            + "\\{CAT=([^\\}]++)\\}|"
            + "\\{CODE=([^\\}]++)\\}|"
            + "\\{DESC=([^\\}]++)\\}");

    Matcher matcher;
    // Go through each line and see if the line matches the any of the regexes
    // defined
    Component currentComponent = null;

    for (String line : lines) {
        matcher = pattern.matcher(line);

        if (matcher.find()) {
            // We found a tag. Find out which one
            String match = matcher.group();

            if (match.startsWith("Obj")) {
                // We've got the object name
                if (currentComponent != null) {
                    Components.add(currentComponent);
                }
                currentComponent = new Component();
                currentComponent.setName(matcher.group(1));
            } else if (currentComponent != null) {
                if (match.startsWith("{CAT")) {
                    currentComponent.setCategory(matcher.group(2));
                } else if (match.startsWith("{CODE")) {
                    currentComponent.setOrderCode(matcher.group(3));
                } else if (match.startsWith("{DESC")) {
                    currentComponent.setDescription(matcher.group(4));
                }
            }
        }
    }

    if (currentComponent != null) {
        Components.add(currentComponent);
    }
}

正如您所看到的,我将四个正则表达式合并为一个并将整个正则表达式应用于该行。如果找到匹配项,我会检查字符串的开头以确定匹配的表达式,然后从组中提取数据。如果有人对运行代码感兴趣,下面将介绍一些示例数据:

Object name.......: PMF3800SN
Last modified.....: Wednesday 9 November 2011 11:55:04 AM
File offset (hex).: 00140598 (Hex).
Checksum (hex)....: C1C0 (Hex).
Size (bytes)......: 1,736
Properties........: {*DEVICE}
                    {PREFIX=Q}
                    {*PROPDEFS}
                    {PACKAGE="PCB Package",PACKAGE,1,SOT-323 MOSFET}
                    {*INDEX}
                    {CAT=Transistors}
                    {SUBCAT=MOSFET}
                    {MFR=NXP}
                    {DESC=N-channel TrenchMOS standard level FET with ESD protection}
                    {CODE=1894711}
                    {*COMPONENT}

                    {PACKAGE=SOT-323 MOSFET}
                    *PINOUT SOT-323 MOSFET
                    {ELEMENTS=1}
                    {PIN "D" = D}
                    {PIN "G" = G}
                    {PIN "S" = S}

虽然我的代码有效,但我不喜欢稍后在调用startsWith例程时重复部分字符串这一事实。

我很想知道别人会怎么写这个。

阿姆鲁

3 个答案:

答案 0 :(得分:3)

对于无法匹配的群组,

group()会返回null。因此,您可以将子表达式分组并在匹配后检查它们null

Pattern pattern = Pattern.compile(
         "(Object name\\.{7}: (.++))|"
         + "(\\{CAT=([^\\}]++)\\})|"
         + "(\\{CODE=([^\\}]++)\\})|"
         + "(\\{DESC=([^\\}]++)\\})"); 
...
if (match.group(1) != null) { // Object ...
    ...
} ...

实际上,如果您的子表达式中没有|,您甚至可以使用现有的组进行此操作。

答案 1 :(得分:2)

正如@axtavt所指出的,你可以直接发现一个小组是否参加了比赛。你甚至不需要改变正则表达式;你已经为每个替代品都有一个捕获组。我喜欢使用start(n)方法进行测试,因为它似乎更整洁,但检查group(n)的空值(如@axtavt所做的那样)会产生相同的结果。这是一个例子:

private static void processFile(ArrayList<String> lines) {

    Pattern p = Pattern.compile(
            "Object name\\.{7}: (.++)|"
            + "\\{CAT=([^\\}]++)\\}|"
            + "\\{CODE=([^\\}]++)\\}|"
            + "\\{DESC=([^\\}]++)\\}");

    // Create the Matcher now and reassign it to each line as we go.
    Matcher m = p.matcher("");

    for (String line : lines) {
        if (m.reset(line).find()) {
            // If group #n participated in the match, start(n) will be non-negative.
            if (m.start(1) != -1) {
                System.out.printf("%ncreating new component...%n");
                System.out.printf("  name: %s%n", m.group(1));
            } else if (m.start(2) != -1) {
                System.out.printf("  category: %s%n", m.group(2));
            } else if (m.start(3) != -1) {
                System.out.printf("  order code: %s%n", m.group(3));
            } else if (m.start(4) != -1) {
                System.out.printf("  description: %s%n", m.group(4));
            }
        }
    }
}

但是,我不确定我同意你在代码中重复部分字符串的理由。如果数据格式发生更改,或者您更改了提取的字段,则在更新代码时似乎更容易失去同步。换句话说,您当前的代码不是多余的,它是自我记录的。 :d

编辑:您在评论中提到了一次处理整个文件而不是逐行处理的可能性。这实际上是更简单的方法:

private static void processFile(String contents) {

    Pattern p = Pattern.compile(
            "Object name\\.{7}: (.++)|"
            + "\\{CAT=([^\\}]++)\\}|"
            + "\\{CODE=([^\\}]++)\\}|"
            + "\\{DESC=([^\\}]++)\\}");

    Matcher m = p.matcher(contents);

    while (m.find()) {
        if (m.start(1) != -1) {
            System.out.printf("%ncreating new component...%n");
            System.out.printf("  name: %s%n", m.group(1));
        } else if (m.start(2) != -1) {
            System.out.printf("  category: %s%n", m.group(2));
        } else if (m.start(3) != -1) {
            System.out.printf("  order code: %s%n", m.group(3));
        } else if (m.start(4) != -1) {
            System.out.printf("  description: %s%n", m.group(4));
        }
    }
}

答案 2 :(得分:0)

我定义了一个元素,它是一个模式+一个可运行的元素。循环遍历线,然后遍历元对象。如果匹配,执行runnable。像,

class Meta {
  Pattern pattern;
  Runnable runnable;
  Matcher matcher;

  Meta(Pattern p, Runnable r) {
    pattern = p;
    runnable = r;
  }
}

Meta[] metas = new Meta[] { new Meta(Pattern.compile(...), new Runnable() { ... }), new Meta(...), ... };


for (String line : lines) {
  for (Meta meta : metas) {
    final Matcher matcher = meta.pattern.matcher(line);
    if (matcher.matches()) {
      meta.matcher = matcher;
      meta.runnable.run();
    }
  }
}

这是“对象”行的Meta对象的样子,

Meta m = new Meta(Pattern.compile("Object name\\.{7}: (.++)", new Runnable() {
  // We've got the object name
  if (currentComponent != null) {
    Components.add(currentComponent);
  }
  currentComponent = new Component();
  currentComponent.setName(matcher.group(1));
});