Question

量词x?表示a single or no occurance of x。

为了方便起见，我发布了一个test harness，用于将正则表达式与字符串进行匹配。

与字符串a?相比，我对正则表达式ababaaaab感到困惑。

该计划的输出是：

Enter your regex: a?

Enter your input string to seacrh: ababaaaab

I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1. 
I found the text "a" starting at index 2 and ending at index 3.
I found the text "" starting at index 3 and ending at index 3.
I found the text "a" starting at index 4 and ending at index 5.
I found the text "a" starting at index 5 and ending at index 6.
I found the text "a" starting at index 6 and ending at index 7.
I found the text "a" starting at index 7 and ending at index 8.
I found the text "" starting at index 8 and ending at index 8.
I found the text "" starting at index 9 and ending at index 9.

Enter your regex:

我对b感到困惑。

“正则表达式a？并不是专门寻找这封信 “B”;它只是在寻找它的存在（或缺乏）字母“a”。如果量词允许匹配“a”零次，输入字符串中不是“a”的任何内容都将显示为零长度匹配。“

Reference

问题： -

第一行是可以理解的，我确实理解b或任何非a的存在是a的缺失，或者是0的出现，因此应该导致匹配。 但缺少a（即b的出现）是在索引1和2之间。那么为什么索引1和1之间的文本“”的匹配（换句话说，为什么我们得到一个这里的零长度匹配。）从我的推理来看，它应该在索引1和2之间。

import java.io.InputStreamReader;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/*
 *  Enter your regex: foo
 *  Enter input string to search: foo
 *  I found the text foo starting at index 0 and ending at index 3.
 * */

public class RegexTestHarness {

    public static void main(String[] args){

        /*Console console = System.console();
        if (console == null) {
            System.err.println("No console.");
            System.exit(1);
        }*/

        while (true) {

            /*Pattern pattern = 
            Pattern.compile(console.readLine("%nEnter your regex: ", null));*/

            System.out.print("\nEnter your regex: ");

            Scanner scanner = new Scanner(new InputStreamReader(System.in));

            Pattern pattern = Pattern.compile(scanner.next());

            System.out.print("\nEnter your input string to seacrh: ");

            Matcher matcher = 
            pattern.matcher(scanner.next());

            boolean found = false;
            while (matcher.find()) {
                /*console.format("I found the text" +
                    " \"%s\" starting at " +
                    "index %d and ending at index %d.%n",
                    matcher.group(),
                    matcher.start(),
                    matcher.end());*/

                System.out.println("I found the text \"" + matcher.group() + "\" starting at index " + matcher.start() + " and ending at index " + matcher.end() + "."); 

                found = true;
            }
            if(!found){
                //console.format("No match found.%n", null);
                System.out.println("No match found."); 
            }
        }
    }
}

Answer 1

但缺少a（即b的出现）是在索引1和2之间。那么为什么索引1和1之间的文本“”匹配（换句话说，为什么我们得到一个零长度匹配在这里）

匹配的长度是与模式匹配的输入字符串的长度。

由于没有“a”，因此只匹配一个空字符串。

同样，模式与“非字符序列”不匹配，它匹配“a”的（可能是空的）序列，总长度为1。在这种情况下，匹配的序列是空的。

但缺少a（即b的出现）

没有a是不 b的出现。缺少a 发生在 b的出现之前，并以b的出现结束。

Answer 2

报告的位置不是角色的位置

要理解的关键是正则表达式引擎没有给你找到匹配字符的位置。

它为您提供了开始成功匹配的起始位置。那个位置不是一个角色。它是人物之间的空间。例如，

位置0是字符串的开头。这就是\A或^断言匹配的位置。
位置1是第一个和第二个字符之间的位置。
第9个位置是b末尾的最后一个ababaaaab之后的位置。这就是\Z或$断言匹配的位置。

Answer 3

a?贪婪。换句话说，正则表达式引擎将按如下方式处理：

foreach index
    if next char is "a"
        return "a"
    else if next char is ""
        return ""
    end if
end foreach

如果您在输入字符串上应用此算法，您将获得与您提供的输出相同的输出。

你可以试试它的非贪婪（或懒惰）等价物：a??。然后正则表达式引擎将按如下方式处理：

foreach index
    if next char is ""
        return ""
    else if next char is "a"
        return "a"
    end if
end foreach

因此在每个索引处都会找到一个空字符串，根本不会输出a。

X？ quantfer：为什么非x给出“零长度”匹配？

3 个答案: