Question

我在this blog post上找到了以下代码示例：

final String FIBONACCI = 
   "(?x) .? | ( \\2?+ (\\1|^.) )* ..";

for (int n = 0; n < 10000; n++) {
   String s = new String(new char[n]);
   if (s.matches(FIBONACCI)) {
      System.out.printf("%s ", n);
   }
}

输出：0 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 ...

(?x) .? | ( \\2?+ (\\1|^.) )* ..如何匹配斐波纳契数？

Answer 1

(?x) .? | ( \\2?+ (\\1|^.) )* ..

这里有很多事情可能会引起混淆。我将详细介绍这些内容，以解释算法的工作原理。

正在使用正则表达式的长度而不是实际数字的字符串上进行匹配。字符串中唯一真实的数据是它的长度。
\\双反斜杠只是因为在Java字符串文字中，必须对反斜杠进行反斜杠处理，以便明确表示您没有逃避其他内容。我不会在这个答案的任何未来代码中显示它们。
(?x)：这可以启用扩展的正则表达式模式。在此模式下，将忽略未反斜杠或在字符类中的空格，从而允许将正则表达式拆分为具有嵌入式注释的更易读的部分。 [sarand.com]
.?：这将匹配0或1个字符串。此匹配仅用于f（0），f（1）和f（2）情况，否则将被丢弃。
|：这意味着如果首次尝试匹配1个或2个字符不起作用，请尝试匹配右侧的所有内容。
(：这会打开第一个组（稍后由\1引用）。
(\2?+ +使?成为占有量词。在这种情况下，结果是?表示使用\2反向引用（如果已定义），+表示不返回并尝试不使用它，如果正则表达式不与之合作。
(\1|^.)：这将匹配到目前为止匹配的所有内容或单个字符。这当然意味着第一个“到目前为止匹配的所有东西”是一个单一的角色。由于这是第二个正则表达式，因此它也是新的\2
)*：这将重复算法。每次重复时，都会定义\1和\2的新值。对于当前迭代，这些值将等于F（n-1）和F（n-2），其将是F（n）。每次迭代都将添加到前一次，这意味着您有一个F（n）0到n的和。尝试通过头脑运行算法获取一些较小的数字来获得想法。
..：需要一个点来匹配未包含在总和中的f（1），第二个是因为Second Identity of Fibonacci Numbers表示斐波那契序列的总和数字是斐波那契数减1。（1）
通过替换，您可以看到这将如何继续添加斐波那契数字，直到填充字符串。第一次迭代与^.匹配，因此1.第二次迭代将前一次与\2的部分匹配以及与\1的前一次匹配相匹配。这使得第二次迭代有两次。第三次迭代从第二次迭代（1）以及整个第二次迭代（2）获取匹配的第二部分。这使得第三次迭代中有三次。将迭代添加到一起，你就得到了一个fib数的总和。

请参阅Why does Java regex engine throw StringIndexOutOfBoundsException on a + repetition?，了解有关此次重复实际原因的详细信息。

Answer 2

我知道在其他答案中已经对此进行了很多详细的解释（包括对常规使用的regex的更好解释），但是我最近遇到了这个regex却没有任何解释，因此我为此添加了一些评论。我想我也要在这里分享给其他人看。

首先要注意的是，正则表达式对整数使用一进制。因此，Java代码中的String s = new String(new char[n]);会将一个整数n转换为这么多（'\0'）个字符的字符串。该字符串包含哪个字符并不重要，对于一元而言，长度很重要。（例如，在Java 11+中，可以选择String s = "x".repeat(n);，它仍然可以按预期运行。）

关于正则表达式本身：

"(?x) .? | ( \\2?+ (\\1|^.) )* .." # Since this is a Java-String, where the `\` are escaped
                                   # as `\\` and `String#matches` also implicitly adds a 
                                   # leading/trailing `^...$` to regex-match the entire
^(?x) .? | ( \2?+  (\1 |^.) )* ..$ # String, the actual regex will be this:
                                   # The `(?x)` is used to enable comments and whitespaces,
                                   # so let's ignore those for now:
^.?|(\2?+(\1|^.))*..$
    (           )*                 # First capture group repeated 0 or more times.
                                   # On each iteration it matches one Fibonacci number.
            |^.                    # In the first iteration, we simply match 1 as base case.
                                   # Afterwards, the ^ can no longer match, so the
                                   # alternative is used.
     \2?+                          # If possible, match group 2. This ends up being the
                                   # Fibonacci number before the last. The reason we need
                                   # to make his optional is that this group isn't defined
                                   # yet in the second iteration. The reason we have the `+`
                                   # is to prevent backtracking: if group 2 exists, we
                                   # *have* to include it in the match, otherwise we would
                                   # allow smaller increments.  
         (\1|  )                   # Finally, match the previous Fibonacci number and store
                                   # it in group 2 so that it becomes the second-to-last
                                   # Fibonacci number in the next iteration.

                                   # This in total ends up adding Fibonacci numbers starting
                                   # at 1 (i.e. 1,2,3,5,8,... will add up to 3,6,11,19,...
                  ..               # They are all two less than the Fibonacci numbers, so
                                   # we add 2 at the end.

                                   # Now it's only missing the 0 and 1 of the Fibonacci
 .?|                               # numbers, so we'll account for those separately

使用正则表达式查找斐波纳契数

2 个答案: