String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find()) {
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
System.out.println("Found value: " + m.group(3));
}
输出
Found value: This order was placed for QT300
Found value: 0
Found value: ! OK?
虽然我期待输出为
Found value: This order was placed for QT3000! OK?
Found value: 3000
Found value: This order was placed for QT3000! OK?
我预期输出的原因是
If pattern is "(.*)" output for m.group(1) is "This order was placed for QT3000! OK?"
If pattern is "(\\d+)" output for m.group(1) is "3000"
我不知道何时提到模式为"(.*)(\\d+)(.*)"
;为什么我没有得到预期的产出?
答案 0 :(得分:2)
.*
在找到\\d+
之前匹配(和消费)尽可能多的字符。当它到达\\d+
时,只有一个数字足以匹配。
所以,你需要让.*
懒惰:
(.*?)(\\d+)(.*)
好吧,如果你想进入细节,.*
首先匹配整个字符串,然后一次回溯一个字符,这样正则表达式也可以匹配稍后出现的(\\d+)(.*)
。一旦它回溯到最后一个字符:
This order was placed for QT300
正则表达式的其余部分((\\d+)(.*)
)已满足,因此匹配结束。
答案 1 :(得分:1)
这是由于第一个(.*)
过于贪婪并尽可能地吃掉,同时仍然允许(\d+)(.*)
匹配其余的字符串。
基本上,比赛是这样的。在开始时,第一个.*
将吞噬整个字符串:
This order was placed for QT3000! OK?
^
但是,由于我们在这里找不到\d+
的匹配项,我们会回溯:
This order was placed for QT3000! OK?
^
This order was placed for QT3000! OK?
^
...
This order was placed for QT3000! OK?
^
在此位置,\d+
可以匹配,因此我们继续:
This order was placed for QT3000! OK?
^
和.*
将匹配字符串的其余部分。
这是你看到的输出的解释。
您可以通过使第一个(.*)
懒惰来解决此问题:
(.*?)(\d+)(.*)
搜索(.*?)
的匹配项将以空字符串开头,并且当它回溯时,它会逐渐增加它吞噬的字符数量:
This order was placed for QT3000! OK?
^
This order was placed for QT3000! OK?
^
...
This order was placed for QT3000! OK?
^
此时,\d+
可以匹配,.*
也可以匹配,完成匹配尝试,输出将如您所愿。