Question

以下两个正则表达式意味着什么？

.*? and .+?

实际上我理解使用这些量词，即

'.' -> Any character
'*' -> 0 or more times
'+' -> once or more times
'?' -> 0 or 1

确实，我真的很困惑！关于使用.*? and .+?。任何人都可以在这些案例中找到适当的例子。

并且非常欢迎您分享介绍有用示例实践的良好链接。提前谢谢。

Answer 1

为了打破我们：

. - Any character
* - Any number of times
? - That is consumed reluctantly

. - Any character
+ - At least once
? - That is consumed reluctantly

一个不情愿或“非贪婪”的量词（'？'）尽可能少地匹配以找到匹配。可以更深入地了解qantifiers（贪婪，不情愿和占有欲）here

Answer 2

.*?和.+?是不情愿的量词。

它们从输入字符串的开头开始，然后不情愿地一次吃一个字符寻找匹配。他们尝试的最后一件事是整个输入字符串。

考虑代码：

        String lines="some";
        String REGEX=".+?";
        Pattern pattern=Pattern.compile(REGEX);
        Matcher matcher =pattern.matcher(lines);
        while(matcher.find()){
            String result=matcher.group();
            System.out.println("RESULT of .+? : "+result);
            System.out.println("RESULT LENGTH : "+result.length());
        }
        System.out.println(lines);
        String REGEX1=".*?";
        Pattern pattern1=Pattern.compile(REGEX1);
        Matcher matcher1 =pattern1.matcher(lines);
        while(matcher1.find()){
            int start=matcher1.start() ;
            int end=matcher1.end() ;
            String result=matcher1.group();
            System.out.println("RESULT of .*? : "+result);
            System.out.println("RESULT LENGTH : "+result.length() +" ,  start "+ start+" end :"+end);
        }

<强>输出：

RESULT of .+? : s
RESULT LENGTH : 1
RESULT of .+? : o
RESULT LENGTH : 1
RESULT of .+? : m
RESULT LENGTH : 1
RESULT of .+? : e
RESULT LENGTH : 1
some
RESULT of .*? : 
RESULT LENGTH : 0 ,  start 0 end :0
RESULT of .*? : 
RESULT LENGTH : 0 ,  start 1 end :1
RESULT of .*? : 
RESULT LENGTH : 0 ,  start 2 end :2
RESULT of .*? : 
RESULT LENGTH : 0 ,  start 3 end :3
RESULT of .*? : 
RESULT LENGTH : 0 ,  start 4 end :4

.+?将尝试在每个字符中找到匹配项，并且匹配（长度1）。

.*?会尝试在每个角色中找到匹配或没有。并且它与每个字符的空字符串匹配。

Answer 3

为了说明，请考虑输入字符串xfooxxxxxxfoo。

Enter your regex: .*foo  // greedy quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.

Enter your regex: .*?foo  // reluctant quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfoo" starting at index 0 and ending at index 4.
I found the text "xxxxxxfoo" starting at index 4 and ending at index 13.

Enter your regex: .*+foo // possessive quantifier
Enter input string to search: xfooxxxxxxfoo
No match found.

第一个例子使用贪婪量词。*来找到“任何”，零次或多次，然后是字母“f”“o”“o”。因为量词是贪婪的，所以表达式的。*部分首先会占用整个输入字符串。此时，整体表达式不能成功，因为已经消耗了最后三个字母（“f”“o”“o”）。因此，匹配器一次缓慢地退回一个字母，直到最右边的“foo”被反刍，此时匹配成功并且搜索结束。

然而，第二个例子是不情愿的，所以它首先消耗“没有”。因为“foo”没有出现在字符串的开头，所以它被强制吞下第一个字母（“x”），这会在0和4处触发第一个匹配。我们的测试工具继续进程直到输入字符串为累。它在4和13找到另一场比赛。

第三个例子找不到匹配，因为量词是占有性的。在这种情况下，整个输入字符串由。* +消耗，不留下任何东西以满足表达式末尾的“foo”。使用占有量词来表示你想要抓住所有东西而不会退缩的情况;在没有立即找到匹配的情况下，它将胜过等效的贪心量词。

您可以在链接http://docs.oracle.com/javase/tutorial/essential/regex/quant.html

中找到

Java正则表达式中的混乱

3 个答案: