团体捕获的预期结果?

时间:2013-09-07 17:15:07

标签: java regex capture-group

String line = "This order was placed for QT3000! OK?";
    String pattern = "(.*)(\\d+)(.*)";

    // Create a Pattern object
    Pattern r = Pattern.compile(pattern);

    // Now create matcher object.
    Matcher m = r.matcher(line);
    if (m.find()) {
      System.out.println("Found value: " + m.group(1));
      System.out.println("Found value: " + m.group(2));
      System.out.println("Found value: " + m.group(3));
    }

输出

Found value: This order was placed for QT300
Found value: 0
Found value: ! OK?

虽然我期待输出为

Found value: This order was placed for QT3000! OK?
Found value: 3000
Found value: This order was placed for QT3000! OK?

我预期输出的原因是

If pattern is  "(.*)"   output for m.group(1) is "This order was placed for QT3000! OK?"
If pattern is  "(\\d+)" output for m.group(1) is "3000"

我不知道何时提到模式为"(.*)(\\d+)(.*)";为什么我没有得到预期的产出?

2 个答案:

答案 0 :(得分:2)

.*在找到\\d+之前匹配(和消费)尽可能多的字符。当它到达\\d+时,只有一个数字足以匹配。

所以,你需要让.*懒惰:

(.*?)(\\d+)(.*)

好吧,如果你想进入细节,.*首先匹配整个字符串,然后一次回溯一个字符,这样正则表达式也可以匹配稍后出现的(\\d+)(.*)。一旦它回溯到最后一个字符:

This order was placed for QT300

正则表达式的其余部分((\\d+)(.*))已满足,因此匹配结束。

答案 1 :(得分:1)

这是由于第一个(.*)过于贪婪并尽可能地吃掉,同时仍然允许(\d+)(.*)匹配其余的字符串。

基本上,比赛是这样的。在开始时,第一个.*将吞噬整个字符串:

This order was placed for QT3000! OK?
                                     ^

但是,由于我们在这里找不到\d+的匹配项,我们会回溯:

This order was placed for QT3000! OK?
                                    ^
This order was placed for QT3000! OK?
                                   ^
...

This order was placed for QT3000! OK?
                               ^

在此位置,\d+可以匹配,因此我们继续:

This order was placed for QT3000! OK?
                                ^

.*将匹配字符串的其余部分。

这是你看到的输出的解释。


您可以通过使第一个(.*) 懒惰来解决此问题:

(.*?)(\d+)(.*)

搜索(.*?)的匹配项将以空字符串开头,并且当它回溯时,它会逐渐增加它吞噬的字符数量:

This order was placed for QT3000! OK?
^
This order was placed for QT3000! OK?
 ^
...

This order was placed for QT3000! OK?
                            ^

此时,\d+可以匹配,.*也可以匹配,完成匹配尝试,输出将如您所愿。