Question

Answer 1

我们与正则表达式中的组相关联的组

"(a[zx](b?))"

Applied to "axb" returns an array of 3 groups:

group 0: axb, the entire match.
group 1: axb, the first group matched.
group 2: b, the second group matched.

除了这些只是“捕获”组。非捕获组（使用'（？：'语法）此处未表示。

"(a[zx](?:b?))"

Applied to "axb" returns an array of 2 groups:

group 0: axb, the entire match.
group 1: axb, the first group matched.

捕获也是我们与“捕获的群组”相关联的内容。但是当组多次应用量词时，只有最后一个匹配保持为组的匹配。 captures数组存储所有这些匹配。

"(a[zx]\s+)+"

Applied to "ax az ax" returns an array of 2 captures of the second group.

group 1, capture 0 "ax "
group 1, capture 1 "az "

关于你的最后一个问题 - 在调查之前我会想到Captures将是他们所属的组所订购的捕获数组。相反，它只是组[0]的别名.Captures。相当无用..

Answer 2

这可以用一个简单的例子（和图片）来解释。

将 3:10pm 与正则表达式 ((\d)+):((\d)+)(am|pm) 匹配，并使用Mono interactive csharp：

csharp> Regex.Match("3:10pm", @"((\d)+):((\d)+)(am|pm)").
      > Groups.Cast<Group>().
      > Zip(Enumerable.Range(0, int.MaxValue), (g, n) => "[" + n + "] " + g);
{ "[0] 3:10pm", "[1] 3", "[2] 3", "[3] 10", "[4] 0", "[5] pm" }

那么1？的位置

由于第四组有多个匹配的数字，我们只能＆＃34;得到＆＃34;如果我们引用该组（具有隐式ToString()，则为最后一个匹配）。为了公开中间匹配，我们需要更深入地引用相关组中的Captures属性：

csharp> Regex.Match("3:10pm", @"((\d)+):((\d)+)(am|pm)").
      > Groups.Cast<Group>().
      > Skip(4).First().Captures.Cast<Capture>().
      > Zip(Enumerable.Range(0, int.MaxValue), (c, n) => "["+n+"] " + c);
{ "[0] 1", "[1] 0" }

由this article提供。

Answer 3

想象一下，您有以下文字输入dogcatcatcat和类似dog(cat(catcat))

的模式

在这种情况下，您有3个组，第一个组（主要组）对应于匹配。

匹配== dogcatcatcat和Group0 == dogcatcatcat

Group1 == catcatcat

Group2 == catcat

那是什么呢？

让我们考虑使用Regex类用C＃（.NET）编写的一个小例子。

int matchIndex = 0;
int groupIndex = 0;
int captureIndex = 0;

foreach (Match match in Regex.Matches(
        "dogcatabcdefghidogcatkjlmnopqr", // input
        @"(dog(cat(...)(...)(...)))") // pattern
)
{
    Console.Out.WriteLine($"match{matchIndex++} = {match}");

    foreach (Group @group in match.Groups)
    {
        Console.Out.WriteLine($"\tgroup{groupIndex++} = {@group}");

        foreach (Capture capture in @group.Captures)
        {
            Console.Out.WriteLine($"\t\tcapture{captureIndex++} = {capture}");
        }

        captureIndex = 0;
    }

    groupIndex = 0;
    Console.Out.WriteLine();
        }

<强>输出：

match0 = dogcatabcdefghi
    group0 = dogcatabcdefghi
        capture0 = dogcatabcdefghi
    group1 = dogcatabcdefghi
        capture0 = dogcatabcdefghi
    group2 = catabcdefghi
        capture0 = catabcdefghi
    group3 = abc
        capture0 = abc
    group4 = def
        capture0 = def
    group5 = ghi
        capture0 = ghi

match1 = dogcatkjlmnopqr
    group0 = dogcatkjlmnopqr
        capture0 = dogcatkjlmnopqr
    group1 = dogcatkjlmnopqr
        capture0 = dogcatkjlmnopqr
    group2 = catkjlmnopqr
        capture0 = catkjlmnopqr
    group3 = kjl
        capture0 = kjl
    group4 = mno
        capture0 = mno
    group5 = pqr
        capture0 = pqr

让我们分析第一场比赛（match0）。

正如您所看到的，有三个次要小组：group3，group4和group5

    group3 = kjl
        capture0 = kjl
    group4 = mno
        capture0 = mno
    group5 = pqr
        capture0 = pqr

由于主模式的子模式'(...)(...)(...) (dog(cat(...)(...)(...)))

，因此创建了这些组（3-5）

group3的值对应于它的捕获（capture0）。（与group4和group5的情况一样）。那是因为没有像(...){3}这样的群组重复。

好的，让我们考虑另一个有组重复的例子。

如果我们修改要匹配的正则表达式模式（对于上面显示的代码）从(dog(cat(...)(...)(...)))到(dog(cat(...){3}))，您会注意到以下群组重复：(...){3}。

现在输出已更改：

match0 = dogcatabcdefghi
    group0 = dogcatabcdefghi
        capture0 = dogcatabcdefghi
    group1 = dogcatabcdefghi
        capture0 = dogcatabcdefghi
    group2 = catabcdefghi
        capture0 = catabcdefghi
    group3 = ghi
        capture0 = abc
        capture1 = def
        capture2 = ghi

match1 = dogcatkjlmnopqr
    group0 = dogcatkjlmnopqr
        capture0 = dogcatkjlmnopqr
    group1 = dogcatkjlmnopqr
        capture0 = dogcatkjlmnopqr
    group2 = catkjlmnopqr
        capture0 = catkjlmnopqr
    group3 = pqr
        capture0 = kjl
        capture1 = mno
        capture2 = pqr

再次，让我们分析第一场比赛（match0）。

由于group4 重复（ {n} <，因此不再有次要小组 group5和(...){3} / em>其中 n＆gt; = 2 ）它们已合并为一个组group3。

在这种情况下，group3值对应于capture2（最后一次捕获，换句话说）。

因此，如果您需要所有3个内部捕获（capture0，capture1，capture2），则必须循环遍历该组的Captures集合。

Сonclusion是：注意你设计模式组的方式。您应该预先考虑导致组规范的行为，例如(...)(...)，(...){2}或(.{3}){2}等。

希望它有助于阐明捕获，群组和匹配之间的差异。

5 个答案: