Question

如何有效地确定搜索字符串中捕获组的位置？获得整个比赛的位置很容易，但我认为没有明显的方法可以获得超越第一个的捕获组。

这是一个简化的例子，让我们假设＆＃34; a *＆＃34;和＆＃34; b *＆＃34;是复杂的正则表达，运行起来很昂贵。

#include <iostream>
#include <regex>
#include <string>
using namespace std;

int main()   
{
    regex matcher("a*(needle)b*");
    smatch findings;
    string haystack("aaaaaaaaneedlebbbbbbbbbbbbbb");

    if( regex_match(haystack, findings, matcher) )
    {
        // What do I put here to know how the offset of "needle" in the 
        // string haystack?

        // This is the position of the entire, which is
        // always 0 with regex_match, with regex_search
        cout << "smatch::position - " << findings.position() << endl;

        // Is this just a string or what? Are there member functions
        // That can be called?
        cout << "Needle - " << findings[1] << endl;
    }

    return 0;
}

如果有帮助我在Coliru中构建了这个问题：http://coliru.stacked-crooked.com/a/885a6b694d32d9b5

Answer 1

我不会将此标记为并在72小时后回答并且没有更好的答案。

在问这个之前我假设smatch :: position没有接受我关心的参数，因为当我读到cppreference页面时，“sub”参数显然不是匹配容器的索引。我认为它与“sub”字符串和整个匹配的偏移值有关。

所以我的答案是：

cout << "Needle Position- " << findings.position(1) << endl;

对此设计的任何解释或我的思路可能导致的其他问题都将受到赞赏。

Answer 2

根据documentation，您可以通过match[n].first和match[n].second访问指向捕获文本开头和结尾的迭代器。要获取开始和结束索引，只需使用haystack.begin()进行指针算法。

if (findings[1].matched) {
    cout << "[" << findings[1].first - haystack.begin() << "-"
                << findings[1].second - haystack.begin() << "] "
                << findings[1] << endl;
}

除主要匹配（索引0）外，捕获组可能会也可能不会捕获任何内容。在这种情况下，first和second将指向字符串的结尾。

我还演示了sub_match对象的matched属性。虽然在这种情况下没有必要，但一般来说，如果你想打印出捕获组的索引，就必须先检查捕获组是否匹配任何东西。

确定C ++ 11正则表达式匹配的位置

2 个答案: