我正在查看TutorialsPoint的代码,从那以后一直困扰着我......看看这段代码:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches
{
public static void main( String args[] ){
// String to be scanned to find the pattern.
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
while(m.find( )) {
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
System.out.println("Found value: " + m.group(3));
}
}
}
此代码成功打印:
Found value: This was placed for QT300
Found value: 0
Found value: ! OK?
但根据正则表达式"(.*)(\\d+)(.*)"
,为什么不返回其他可能的结果,例如:
Found value: This was placed for QT30
Found value: 00
Found value: ! OK?
或
Found value: This was placed for QT
Found value: 3000
Found value: ! OK?
如果此代码不适合这样做,那么如何编写一个可以找到所有可能匹配的代码呢?
答案 0 :(得分:5)
由于*
的{{3}}而来greediness。
字符串:
This order was placed for QT3000! OK?
正则表达式:
(.*)(\\d+)(.*)
我们都知道.*
是贪婪的,并且尽可能地匹配所有角色。因此,第一个.*
匹配最后一个字符?
的所有字符,然后按顺序回溯以提供匹配。我们的正则表达式中的下一个模式是\d+
,因此它回溯到一个数字。找到数字后,\d+
会匹配该数字,因为此处满足条件( \d+
匹配一个或多个数字)。现在,第一个(.*)
捕获This order was placed for QT300
和以下(\\d+)
捕获位于0
符号之前的数字!
。
现在,下一个模式(.*)
会捕获!<space>OK?
的所有剩余字符。 m.group(1)
指的是组索引1中存在的字符,而m.group(2)
指的是索引2,就像它继续存在一样。
请参阅演示backtracking。
获得所需的输出。
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d{2})(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
while(m.find( )) {
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
System.out.println("Found value: " + m.group(3));
}
输出:
Found value: This order was placed for QT30
Found value: 00
Found value: ! OK?
(.*)(\\d{2})
,按顺序回溯最多两位数以提供匹配。
将您的模式更改为此
String pattern = "(.*?)(\\d+)(.*)";
获得类似的输出,
Found value: This order was placed for QT
Found value: 3000
Found value: ! OK?
?
迫使*
进行非贪婪的匹配后, *
。
使用额外的捕获组来获取单个程序的输出。
String line = "This order was placed for QT3000! OK?";
String pattern = "((.*?)(\\d{2}))(?:(\\d{2})(.*))";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
while(m.find( )) {
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(4));
System.out.println("Found value: " + m.group(5));
System.out.println("Found value: " + m.group(2));
System.out.println("Found value: " + m.group(3) + m.group(4));
System.out.println("Found value: " + m.group(5));
}
输出:
Found value: This order was placed for QT30
Found value: 00
Found value: ! OK?
Found value: This order was placed for QT
Found value: 3000
Found value: ! OK?
答案 1 :(得分:3)
(.*?)(\\d+)(.*)
通过*
来使*?
贪婪量词非贪婪。
因为你的第一组(.*)
是贪婪的,所以它会捕获evrything并且只会留下0
\d
来捕获。如果你让它不贪婪,它会给你预期的结果。见演示。