Question

这是代码：

#include <string>
#include <regex>
#include <iostream>

int main()
{
    std::string pattern("[^c]ei");
    pattern = "[[:alpha:]]*" + pattern + "[[:alpha:]]*";
    std::regex r(pattern); 
    std::smatch results;   
    std::string test_str = "cei";

    if (std::regex_search(test_str, results, r)) 
        std::cout << results.str() << std::endl;      

    return 0;
}

输出：

cei

使用的编译器是gcc 4.9.1。

我是学习正则表达式的新手。我预计不会输出任何内容，因为"cei"与此处的模式不匹配。我做得对吗？有什么问题？

更新

此报告已被报告并确认为错误，详情请访问： https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63497

Answer 1

这是实施中的一个错误。我试过的其他几个工具不仅同意你的模式与你的输入不符，而且我试过了：

#include <string>
#include <regex>
#include <iostream>

int main()
{
  std::string pattern("([a-z]*)([a-z])(e)(i)([a-z]*)");
  std::regex r(pattern);
  std::smatch results;
  std::string test_str = "cei";

  if (std::regex_search(test_str, results, r))
  {
    std::cout << results.str() << std::endl;

    for (size_t i = 0; i < results.size(); ++i) {
      std::ssub_match sub_match = results[i];
      std::string sub_match_str = sub_match.str();
      std::cout << i << ": " << sub_match_str << '\n';
    }
  }
}

这与您的内容基本相似，但为了简单起见，我将[:alpha:]替换为[a-z]，并且我还暂时将[^c]替换为[a-z]，因为这似乎使它工作正常。这是打印的内容（Linux x86-64上的GCC 4.9.0）：

cei
0: cei
1:
2: c
3: e
4: i
5:

如果我将“[a-z]替换为[^c]，而只是放置f，则会正确地说明模式不匹配。但如果我像你一样使用[^c]：

std::string pattern("([a-z]*)([^c])(e)(i)([a-z]*)");

然后我得到了这个输出：

cei
0: cei
1: cei
terminate called after throwing an instance of 'std::length_error'
  what():  basic_string::_S_create
Aborted (core dumped)

因此它声称匹配成功，结果[0]是“cei”，这是预期的。然后，结果[1]也是“cei”，我想也许可以。但结果[2]崩溃，因为它试图用begin = nullptr构造长度为std::string的{{1}}。而这个巨大的数字恰好是18446744073709551614，又名2^64 - 2（在我的系统上）。

所以我认为某处存在一个错误的错误，其影响可能不仅仅是虚假的正则表达式匹配 - 它可能会在运行时崩溃。

Answer 2

正则表达式正确，不匹配字符串＆＃34; cei＆＃34;。

可以在Perl中最好地测试和解释正则表达式：

 my $regex = qr{                 # start regular expression
                 [[:alpha:]]*    # 0 or any number of alpha chars
                 [^c]            # followed by NOT-c character
                 ei              # followed by e and i characters
                 [[:alpha:]]*    # followed by 0 or any number of alpha chars    
               }x;               # end + declare 'x' mode (ignore whitespace)

 print "xei" =~ /$regex/ ? "match\n" : "no match\n";
 print "cei" =~ /$regex/ ? "match\n" : "no match\n";

正则表达式首先将所有字符消耗到字符串末尾（[[:alpha:]]*），然后回溯以查找NON-c字符[^c]并继续执行e我匹配（通过回溯另一次）。

结果：

 "xei"  -->  match
 "cei"  -->  no match

原因很明显。各种C ++库和测试工具中的任何差异都是那里的实现问题，imho。

std :: regex中的错误？

2 个答案: