提取两个其他模式之间的匹配线

时间:2014-01-08 12:15:29

标签: c++ regex perl c++11

我正在尝试在C ++中使用正则表达式来提取与某个单词相匹配的行 - 来自由两个其他模式限定的文件中的区域内。我还想打印每场比赛的行号。

我目前正在使用perl运行popen命令,但我想用C ++执行此操作:

perl -ne 'if ((/START/ .. /END/) && /test/) {print "line$.:$_"}' file

此命令在STARTEND之间找到区域,然后从包含单词test的提取行中找到区域。

如何使用C ++中的正则表达式执行此操作?

1 个答案:

答案 0 :(得分:3)

semantics of Perl’s ..很微妙。以下代码模拟..while (<>) { ... }隐含的-n切换为perl

#include <fstream>
#include <iostream>
#include <regex>
#include <vector>

// emulate Perl's .. operator
void flipflop(bool& inside, const std::regex& start, const std::regex& end, const std::string& str)
{
  if (!inside && std::regex_match(str, start))
    inside = true;
  else if (inside && std::regex_match(str, end))
    inside = false;
}

int main(int argc, char *argv[])
{
  // extra .* wrappers to use regex_match in order to work around
  // problems with regex_search in GNU libstdc++
  std::regex start(".*START.*"), end(".*END.*"), match(".*test.*");

  for (const auto& path : std::vector<std::string>(argv + 1, argv + argc)) {
    std::ifstream in(path);
    std::string str;
    bool inside = false;
    int line = 0;
    while (std::getline(in, str)) {
      ++line;
      flipflop(inside, start, end, str);
      if (inside && std::regex_match(str, match))
        std::cout << path << ':' << line << ": " << str << '\n';

      // Perl's .. becomes false AFTER the rhs goes false,
      // so keep this last to allow match to succeed on the
      // same line as end
      flipflop(inside, start, end, str);
    }
  }

  return 0;
}

例如,请考虑以下输入。

test ERROR 1
START
test
END
test ERROR 2
START
foo ERROR 3
bar ERROR 4
test 1
baz ERROR 5
END
test ERROR 6
START sldkfjsdflkjsdflk
test 2
END
lksdjfdslkfj
START
dslfkjs
sdflksj
test 3
END dslkfjdsf

样品运行:

$ ./extract.exe file
file:3: test
file:9: test 1
file:14: test 2
file:20: test 3

$ ./extract.exe file file
file:3: test
file:9: test 1
file:14: test 2
file:20: test 3
file:3: test
file:9: test 1
file:14: test 2
file:20: test 3