如何使用std :: regex_search查找所有出现的模式?

时间:2015-09-29 13:27:48

标签: c++ regex c++11

我有以下代码从任意格式化的文件中解析一堆部件号(基本上是产品组件的序列号)。

auto buildPartNumberRegexString( bool preFlash ) -> std::string
{
    std::ostringstream patternBuilder;

    // The original, raw literal as tested on https://regex101.com/ is:
    //
    // @@PART_NUMBER_POST_FLASH\<\s*(\S+)\s*\,\s*(\d+)\s*\>@@
    //
    // In C++, each backslash needs to be doubled. Alternatively, we could use raw string literals ( R"\w" ).

    patternBuilder << "@@PART_NUMBER_" << ( preFlash ? "PRE" : "POST" )
        << "_FLASH\\<\\s*(\\S+)\\s*\\,\\s*(\\d+)\\s*\\>@@";

    return patternBuilder.str();
}

auto parsePartNumberAddresses( const std::string& templateFileContent, bool preFlash ) -> ParamAddressContainer
{
    const std::regex regEx( buildPartNumberRegexString( preFlash ) );
    std::smatch match;

    if ( std::regex_search( templateFileContent, match, regEx ) )
    {
        assert( match.size() > 1 );
        const std::size_t capturedGroups = match.size() - 1;

        assert( capturedGroups % 2 == 0 );
        const std::size_t partNumberAddressesFound = capturedGroups / 2;

        ParamAddressContainer results;
        results.reserve( partNumberAddressesFound );

        std::cerr << "DEBUG: capturedGroups = " << capturedGroups << ", partNumberAddressesFound = " << partNumberAddressesFound
            << "\n";

        for ( std::size_t i = 0; i < partNumberAddressesFound; ++i )
        {
            const std::size_t paramIdMatchIndex = i * 2 + 1;
            const std::string paramIdString = match.str( paramIdMatchIndex );
            const std::string paramIndexString = match.str( paramIdMatchIndex + 1 );

            results.emplace_back( util::string_funcs::fromString< ParamId_t > ( paramIdString ),
                util::string_funcs::fromString< ParamIndex_t > ( paramIndexString ) );
        }

        std::cerr << "DEBUG: Going to read the following part numbers (" << ( preFlash ? "pre" : "post" ) << "-flash):\n\n";

        for ( const auto& paramAddress : results )
        {
            std::cerr << "\t" << std::hex << std::noshowbase << paramAddress.paramId << std::dec << "<" << paramAddress.paramIndex
                << ">\n";
        }

        return results;
    }

    return ParamAddressContainer();
}

我在buildPartNumberRegexString函数的注释中编写了“美化”正则表达式(即没有转义实际反斜杠所需的双反斜杠)。

我正在使用此正则表达式的示例文件可能如下所示:

Component alpha;@@PART_NUMBER_POST_FLASH<F12C,0>@@
Component beta;@@PART_NUMBER_POST_FLASH<F12C,1>@@

我在https://regex101.com/上使用相同的示例文件测试了我的正则表达式,它完全正常工作,匹配两次出现并提取所需的匹配组。问题是,当我尝试通过std :: regex做同样的事情时,它只找到第一个匹配。现在https://regex101.com/我必须启用 g 修饰符(全局,所有匹配,不要在第一次匹配时返回),以便正则表达式找到所有匹配项。我假设(希望)类似的标志可用于std::regex_search,但可用标志(http://en.cppreference.com/w/cpp/regex/match_flag_type)的描述似乎没有列出满足我要求的任何标志。肯定有一种方法可以找到一种以上的模式,对吧?有没有人有想法?

1 个答案:

答案 0 :(得分:1)

很抱歉,如果有人认为这不能保证发布我自己的问题的答案,但我想我会为其他正在寻找解决方案的人发布我的更新和工作解决方案。好的,按照Cubbi的建议,我决定使用std :: regex_iterator,我更喜欢重复调用std :: regex_search。下面是我修改过的parsePartNumberAddresses函数:

auto parsePartNumberAddresses( const std::string& templateFileContent, bool preFlash ) -> ParamAddressContainer
{
    const std::regex regEx( buildPartNumberRegexString( preFlash ) );

    const auto begin_iterator = std::sregex_iterator( templateFileContent.cbegin(), templateFileContent.cend(), regEx );
    const auto end_iterator = std::sregex_iterator();

    ParamAddressContainer results;

    for ( std::sregex_iterator it = begin_iterator; it != end_iterator; ++it )
    {
        const std::smatch& match = *it;
        assert( match.size() == 3 );

        const std::string paramIdString = match.str( 1 );
        const std::string paramIndexString = match.str( 2 );

        results.emplace_back( util::string_funcs::fromString< ParamId_t > ( paramIdString ),
            util::string_funcs::fromString< ParamIndex_t > ( paramIndexString ) );
    }

    std::cerr << "DEBUG: Going to read the following part numbers (" << ( preFlash ? "pre" : "post" ) << "-flash):\n\n";

    for ( const auto& paramAddress : results )
    {
        std::cerr << "\t" << std::hex << std::noshowbase << paramAddress.paramId << std::dec << "<" << paramAddress.paramIndex
            << ">\n";
    }

    return results;
}

这完全符合预期,在主题字符串中提供所有匹配。 :)

UPDATE :正如rici建议的那样,删除了多余的std :: distance()以及results.reserve()调用,以防止对正则表达式进行两次评估。