Java Regular Expression Matcher没有找到所有可能的匹配项

时间:2015-01-20 05:28:57

标签: java regex

我正在查看TutorialsPoint的代码,从那以后一直困扰着我......看看这段代码:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches
{
    public static void main( String args[] ){

      // String to be scanned to find the pattern.
      String line = "This order was placed for QT3000! OK?";
      String pattern = "(.*)(\\d+)(.*)";

      // Create a Pattern object
      Pattern r = Pattern.compile(pattern);

      // Now create matcher object.
      Matcher m = r.matcher(line);
      while(m.find( )) {
         System.out.println("Found value: " + m.group(1));
         System.out.println("Found value: " + m.group(2));
         System.out.println("Found value: " + m.group(3));
      }
   }
}

此代码成功打印:

Found value: This was placed for QT300 
Found value: 0
Found value: ! OK?

但根据正则表达式"(.*)(\\d+)(.*)",为什么不返回其他可能的结果,例如:

Found value: This was placed for QT30 
Found value: 00
Found value: ! OK?

Found value: This was placed for QT 
Found value: 3000
Found value: ! OK?

如果此代码不适合这样做,那么如何编写一个可以找到所有可能匹配的代码呢?

2 个答案:

答案 0 :(得分:5)

由于*的{​​{3}}而来greediness

字符串:

This order was placed for QT3000! OK?

正则表达式:

(.*)(\\d+)(.*)

我们都知道.*是贪婪的,并且尽可能地匹配所有角色。因此,第一个.*匹配最后一个字符?的所有字符,然后按顺序回溯以提供匹配。我们的正则表达式中的下一个模式是\d+,因此它回溯到一个数字。找到数字后,\d+会匹配该数字,因为此处满足条件( \d+匹配一个或多个数字)。现在,第一个(.*)捕获This order was placed for QT300和以下(\\d+)捕获位于0符号之前的数字!

现在,下一个模式(.*)会捕获!<space>OK?的所有剩余字符。 m.group(1)指的是组索引1中存在的字符,而m.group(2)指的是索引2,就像它继续存在一样。

请参阅演示backtracking

获得所需的输出。

String line = "This order was placed for QT3000! OK?";
  String pattern = "(.*)(\\d{2})(.*)";

  // Create a Pattern object
  Pattern r = Pattern.compile(pattern);

  // Now create matcher object.
  Matcher m = r.matcher(line);
  while(m.find( )) {
     System.out.println("Found value: " + m.group(1));
     System.out.println("Found value: " + m.group(2));
     System.out.println("Found value: " + m.group(3));
  }

输出:

Found value: This order was placed for QT30
Found value: 00
Found value: ! OK?

(.*)(\\d{2}),按顺序回溯最多两位数以提供匹配。

将您的模式更改为此

String pattern = "(.*?)(\\d+)(.*)";

获得类似的输出,

Found value: This order was placed for QT
Found value: 3000
Found value: ! OK?
?迫使*进行非贪婪的匹配后,

*

使用额外的捕获组来获取单个程序的输出。

String line = "This order was placed for QT3000! OK?";
String pattern = "((.*?)(\\d{2}))(?:(\\d{2})(.*))";
Pattern r = Pattern.compile(pattern);
      Matcher m = r.matcher(line);
      while(m.find( )) {
         System.out.println("Found value: " + m.group(1));
         System.out.println("Found value: " + m.group(4));
         System.out.println("Found value: " + m.group(5));
         System.out.println("Found value: " + m.group(2));
         System.out.println("Found value: " + m.group(3) + m.group(4));
         System.out.println("Found value: " + m.group(5));
     }

输出:

Found value: This order was placed for QT30
Found value: 00
Found value: ! OK?
Found value: This order was placed for QT
Found value: 3000
Found value: ! OK?

答案 1 :(得分:3)

(.*?)(\\d+)(.*)

通过*来使*?贪婪量词非贪婪。

因为你的第一组(.*)是贪婪的,所以它会捕获evrything并且只会留下0 \d来捕获。如果你让它不贪婪,它会给你预期的结果。见演示。

https://regex101.com/r/tX2bH4/53