Question

我是regex的新手，我正在经历the regex quantifier section。我对*量词有疑问。以下是*量词的定义：

X* - 找不到或多个字母X
.* - 任何字符序列

根据上面的定义，我写了一个小程序：

public static void testQuantifier() {
    String testStr = "axbx";
    System.out.println(testStr.replaceAll("x*", "M"));
    //my expected output is MMMM but actual output is MaMMbMM
    /*
    Logic behind my expected output is:
    1. it encounters a which means 0 x is found. It should replace a with M.
    2. it encounters x which means 1 x is found. It should replace x with M.
    3. it encounters b which means 0 x is found. It should replace b with M.
    4. it encounters x which means 1 x is found. It should replace x with M.
    so output should be MMMM but why it is MaMMbMM?
    */

    System.out.println(testStr.replaceAll(".*", "M"));
    //my expected output is M but actual output is MM

    /*
    Logic behind my expected output is:
    It encounters axbx, which is any character sequence, it should 
    replace complete sequence with M.
    So output should be M but why it is MM?
    */
}

更新： -

根据修订后的理解，我希望输出为MaMMbM，但不是MaMMbMM。所以我不明白为什么我最终得到额外的M？

我对第一个正则表达式的修订理解是：

1. it encounters a which means 0 x is found. It should replace a with Ma.
2. it encounters x which means 1 x is found. It should replace x with M.
3. it encounters b which means 0 x is found. It should replace b with Mb.
4. it encounters x which means 1 x is found. It should replace x with M.
5. Lastly it encounters end of string at index 4. So it replaces 0x at end of String with M.

（虽然我觉得很难考虑字符串结尾的索引）

所以第一部分现在很清楚。

如果有人可以澄清第二个正则表达式，那将会有所帮助。

Answer 1

这是你出错的地方：

首先遇到一个意味着找到0 x。所以它应该替换为M。

否 - 表示找到0 x个，然后找到a 。您没有说a应该被M替换...您已经说过任何数量的x s（包括0）应该被{{1}替换}。

如果您想要将{em>每个字符替换为M，则应使用M：

（我个人原本期望得到System.out.println(testStr.replaceAll(".", "23"));的结果 - 我正在研究为什么你会得到MaMbM - 我怀疑这是因为它之间有一个0 MaMMbMM的序列。 x和x，但对我来说似乎有些奇怪。）

编辑：如果你看一下模式匹配的地方就会变得更清楚了。以下代码显示：

结果（请记住，结尾是独家）并做了一些解释：

Pattern pattern = Pattern.compile("x*");
Matcher matcher = pattern.matcher("axbx");
while (matcher.find()) {
    System.out.println(matcher.start() + "-" + matcher.end());
}

如果用“M”替换每个匹配项，最终会得到实际得到的输出。

我认为根本问题在于，如果你有一个可以匹配（完整地）空字符串的模式，你可以说这个模式在任何一个之间发生无限次。输入中有两个字符。我可能会尝试尽可能避免这种模式 - 确保任何匹配必须包含至少一个字符。

Answer 2

a和b未被替换，因为它们与正则表达式不匹配。 x es和非匹配字母前或字符串结尾之前的空字符串被替换。

让我们看看会发生什么：

我们正处于字符串的开头。正则表达式引擎尝试匹配x但失败，因为此处有a。
正则表达式引擎回溯，因为x*也允许零重复x。我们有匹配并替换为M。
正则表达式引擎超越a并成功匹配x。替换为M。
正则表达式引擎现在尝试在x之前的当前位置（上一场比赛之后）匹配b。它不能。
但它可以再次回溯，匹配零x es。替换为M。
正则表达式引擎超越b并成功匹配x。替换为M。
正则表达式引擎现在尝试在当前位置（在上一个匹配之后）匹配x，该位置在字符串的末尾。它不能。
但它可以再次回溯，匹配零x es。替换为M。

顺便说一句，这是依赖于实现的。例如，在Python中，它是

>>> re.sub("x*", "M", "axbx")
'MaMbM'

因为那里有empty matches for the pattern are replaced only when not adjacent to a previous match。

在正则表达式中没有正确获得*量词？

2 个答案: