Question

我使用各种正则表达式来逐行解析C源文件。首先，我用字符串读取文件的所有内容：

ifstream file_stream("commented.cpp",ifstream::binary);

std::string txt((std::istreambuf_iterator<char>(file_stream)),
std::istreambuf_iterator<char>());

然后我使用一组正则表达式，应该连续应用，直到匹配找到，这里我只给出一个例如：

vector<regex> rules = { regex("^//[^\n]*$") };

char * search =(char*)txt.c_str();

int position = 0, length = 0;

for (int i = 0; i < rules.size(); i++) {
  cmatch match;

  if (regex_search(search + position, match, rules[i],regex_constants::match_not_bol | regex_constants::match_not_eol)) 
  {
     position += ( match.position() + match.length() );        
  }

}

但它不起作用。它将匹配不在当前行中的评论，但会搜索整个字符串，对于第一个匹配，regex_constants::match_not_bol和regex_constants::match_not_eol应使regex_search识别^$为仅开始/结束行，而不是整个块的结束开始/结束。所以这是我的文件：

commented.cpp：

#include <stdio.h>
//comment

代码应该失败，我的逻辑是使用regex_search的那些选项，匹配应该失败，因为它应该在第一行搜索模式：

#include <stdio.h>

但是它会搜索整个字符串，并且不可避免地找到//comment。我需要帮助，只能在当前行中匹配regex_search。选项match_not_bol和match_not_eol对我没有帮助。当然我可以在一个向量中逐行读取一个文件，然后匹配vector中每个字符串的所有规则，但它很慢，我已经这样做了，并且解析一个大文件需要很长时间那就是为什么我想让正则表达式处理新行，并使用定位计数器。

Answer 1

如果不是您想要的，请发表评论，以便我删除答案

您正在做的不是使用正则表达式库的正确方法因此，我建议任何想要使用std::regex库的人。

它只支持ECMAScript以某种方式比所有现代regex库都差。
它有很多你喜欢的错误（只是我发现）：
在某些情况下（我专门针对std::match_results测试）与d语言中的std.regex相比， 200 慢
>
它非常混乱flag-match，几乎不起作用（至少对我而言）

结论：根本不要使用它。

但如果有人仍然要求使用c++那么你可以：

使用boost::regex ^{about Boost library}因为：
1. PCRE支持
2. 它有较少的bug（我没见过）
3. bin 文件中较小（我的意思是编译后的可执行文件）
4. 比std::regex
使用下面的gcc version 7.1.0和 NOT 。我发现的最后一个错误是版本6.3.0
使用clang version 3或以上

如果你诱使（=说服） NOT 使用c++，那么你可以使用：

使用d正则表达式^link库执行大型任务：std.regex以及原因：
1. 快速^{Faster Command Line Tools in D}
2. 简单
3. 灵活^drn
使用link
中编写的原生pcre或pcre2 ^c
- 极快但有点复杂
使用perl执行简单任务，特别是 Perl one-liner ^link

Answer 2

#include <stdio.h> //comment

代码应该失败，我的逻辑是使用regex_search的那些选项，匹配应该失败，因为它应该在第一行搜索模式：

#include <stdio.h>

但是它搜索整个字符串，并且不断发现//评论。我需要帮助，只能在当前行中使regex_search匹配。

您是要尝试匹配源代码文件中的所有//个评论，还是只匹配第一行？

前者可以这样做：

#include <iostream>
#include <fstream>
#include <regex>

int main()
{
  auto input = std::ifstream{"stream_union.h"};

  for(auto line = std::string{}; getline(input, line); )
  {
    auto submatch = std::smatch{};
    auto pattern = std::regex(R"(//)");
    std::regex_search(line, submatch, pattern);

    auto match = submatch.str(0);
    if(match.empty()) continue;

    std::cout << line << std::endl;
  }
  std::cout << std::endl;

  return EXIT_SUCCESS;
}

后者可以这样做：

#include <iostream>
#include <fstream>
#include <regex>

int main()
{
  auto input = std::ifstream{"stream_union.h"};
  auto line = std::string{};
  getline(input, line);

  auto submatch = std::smatch{};
  auto pattern = std::regex(R"(//)");
  std::regex_search(line, submatch, pattern);

  auto match = submatch.str(0);
  if(match.empty()) { return EXIT_FAILURE; }

  std::cout << line << std::endl;

  return EXIT_SUCCESS;
}

如果出于任何原因你想要获得比赛的位置，tellg（）会为你做到这一点。

std regex_search仅匹配当前行

2 个答案: