Question

在学习了基本c++规则后，我专注于std::regex，创建了两个控制台应用：1。renrem和2。bfind。
我决定创建一些方便的函数来尽可能简单地处理c++中的regex加上std的所有函数;命名为RFC（= regex函数集合）

有几件奇怪的事总让我惊讶，但是这一件事毁了我所有的尝试和那两个控制台应用程序。

其中一个重要的函数是count_match，它计算字符串内的匹配数。这是完整的代码：

unsigned int count_match( const std::string& user_string, const std::string& user_pattern, const std::string& flags = "o" ){

    const bool flags_has_i = flags.find( "i" ) < flags.size();
    const bool flags_has_g = flags.find( "g" ) < flags.size();

    std::regex::flag_type regex_flag                  = flags_has_i ? std::regex_constants::icase         : std::regex_constants::ECMAScript;
//    std::regex_constants::match_flag_type search_flag = flags_has_g ? std::regex_constants::match_default : std::regex_constants::format_first_only;
    std::regex rx( user_pattern, regex_flag );
    std::match_results< std::string::const_iterator > mr;

    unsigned int counter = 0;
    std::string temp = user_string;
    while( std::regex_search( temp, mr, rx ) ){
        temp = mr.suffix().str();
        ++counter;
    }

    if( flags_has_g ){
        return counter;
    } else {
        if( counter >= 1 ) return 1;
        else               return 0;
    }

}

首先，正如您所看到的，search_flag的行已被评论，因为std::regex_search忽略了我不知道为什么？因为 - std::regex_repalce接受确切标记。因此std::regex_search会忽略format_first_only，但std::regex_replace会接受它。我们就这样吧。

主要问题是，当模式为字符类时，icase标志也会被忽略 - ＆gt; []。实际上，当模式仅为capital letter或small letter时：[A-Z]或[a-z]

假设此字符串s = "ONE TWO THREE four five six seven"

c++ std

的输出

std::cout << count_match( s, "[A-Z]+" ) << '\n';          // 1 => First match
std::cout << count_match( s, "[A-Z]+", "g" ) << '\n';     // 3 => Global match
std::cout << count_match( s, "[A-Z]+", "gi" ) << '\n';    // 3 => Global match plus insensitive

而对于确切的perl和d laugauge以及c++与boost，输出为：

std::cout << count_match( s, "[A-Z]+" ) << '\n';          // 1 => First match
std::cout << count_match( s, "[A-Z]+", "g" ) << '\n';     // 3 => Global match
std::cout << count_match( s, "[A-Z]+", "gi" ) << '\n';    // 7 => Global match plus insensitive

我知道正则表达式的味道 PCRE ;或c ++使用它的 ECMAScript 262 ，但我没有想法为什么一个简单的标志被c ++所拥有的唯一搜索功能忽略？自std::regex_iterator和std::regex_token_iterator也在内部使用此功能。

很快，我无法将这两个应用程序和RFC与std库一起使用，因为如果这样的话！

所以，如果有人知道根据哪个规则它可能是ECMAScript 262中的有效粗鲁，或者如果我在任何地方都错了，请告诉我。感谢。

使用

进行测试

gcc version 6.3.0 20170519 (Ubuntu/Linaro 6.3.0-18ubuntu2~16.04)
clang version 3.8.0-2ubuntu4

perl代码：

perl -le '++$c while $ARGV[0] =~ m/[A-Z]+/g; print $c ;' "ONE TWO THREE four five six seven" // 3
perl -le '++$c while $ARGV[0] =~ m/[A-Z]+/gi; print $c ;' "ONE TWO THREE four five six seven" // 7

d代码：

uint count_match( ref const (char[]) user_string, const (char[]) user_pattern, const (char[]) flags ){

    const bool flag_has_g = flags.indexOf( "g" ) != -1;

    Regex!( char ) rx = regex( user_pattern, flags );
    uint counter = 0;
    foreach( mr; matchAll( user_string, rx ) ){
        ++counter;
    }

    if( flag_has_g ){
        return counter;
    } else {
        if( counter >= 1 ) return 1;
        else               return 0;
    }
}

输出：

writeln( count_match( s, "[A-Z]+", "g" ) );  // 3
writeln( count_match( s, "[A-Z]+", "gi" ) ); // 7

js代码：

var s = "ONE TWO THREE four five six seven";

var rx1 = new RegExp( "[A-Z]+" , "g" );
var rx2 = new RegExp( "[A-Z]+" , "gi" );

var counter = 0;
while( rx1.exec( s ) ){
   ++counter;
}
document.write( counter + "<br>" ); // 3

counter = 0;
while( rx2.exec( s ) ){
   ++counter;
}
document.write( counter ); // 7

好。使用gcc 7.1.0进行测试后发现，如果版本低于6.3.0，则输出为：1 3 3，但7.1.0的输出为1 3 7 here is the link。

此版本的clang输出也是正确的。 Here is the link。感谢 igor-tandetnik 用户

Answer 1

首先，我认为这可能是ECMAScript的规则，但经过测试js代码并看到 Igor Tandetnik 表示我使用{{1}测试代码并输出正确的结果。

为了测试正则表达式库，我使用：

gcc 7.1.0

因此，当设置std::cout << ( rx.flags() & std::regex_constants::icase == std::regex_constants::icase ? "yes" : "no" ) << '\n';时，它将返回icase，否则返回true。所以我认为没有库错。 Here is the test with gcc 7.1.0
因此，false以下的所有版本都有错误的输出。

对于gcc 7.1.0我没有任何想法，因为我有clang并且输出不正确。但在线版甚至clang 3.8.0输出都是正确的。

此代码的3.7.1屏幕截图：

clang 3.8.0

使用在线编译器the output is incorrect for clang 3.2 and below。但更高版本输出正确的结果。

如果我错了，请纠正我

Answer 2

首先，正如您所看到的，search_flag的行已被注释，因为它被std :: regex_search忽略，我不知道为什么？因为 - std :: regex_repalce接受了确切的标志。

有问题的旗帜是format_first_only。对于＆＃34;替换＆＃34;该标志仅对有意义操作。在regex_replace中，默认值是＆＃34;替换所有＆＃34;但是如果你通过这个标志就会变成＆＃34;只能先替换。＆＃34;

在regex_match和regex_search中，根本没有替代品;这两个函数都只找到第一个匹配项（在regex_match的情况下，匹配必须使用整个字符串）。由于该标志在这种情况下毫无意义，我希望实现忽略它;但是，如果它选择对它产生嘈杂，我也不会因为抛出异常而导致错误。

主要问题是，当模式是字符类时，也会忽略icase标志 - ＆gt; []。实际上，当模式只是大写字母或小写字母时：[A-Z]或[a-z]

icase对于字符类工作错误肯定是供应商库中的错误。

Looks like libstdc ++在GCC 6.3（2016年12月）和GCC 7。1（2017年5月）之间修复了错误。
Looks like libc ++在Clang 3.2（2012年12月）和Clang 3。3（2013年6月）之间修复了错误。

std :: regex并忽略标志

2 个答案: