Question

我正试图从头开始学习一些C ++ 我精通python，perl，javascript但只是简单地遇到过C ++ 过去的课堂设置。请原谅我问题的天真。

我想使用正则表达式拆分字符串，但没有找到太多运气一个清晰，明确，有效和完整的如何在C ++中执行此操作的示例。

在perl中，这是行动很常见，因此可以以微不足道的方式完成，

/home/me$ cat test.txt
this is  aXstringYwith, some problems
and anotherXY line with   similar issues

/home/me$ cat test.txt | perl -e'
> while(<>){
>   my @toks = split(/[\sXY,]+/);
>   print join(" ",@toks)."\n";
> }'
this is a string with some problems
and another line with similar issues

我想知道如何最好地在C ++中完成等效。

编辑：
我想我在boost库中找到了我想要的东西，如下所述。

boost regex-token-iterator（为什么不强调工作？）

我想我不知道该搜索什么。


#include <iostream>
#include <boost/regex.hpp>

using namespace std;

int main(int argc)
{
  string s;
  do{
    if(argc == 1)
      {
        cout << "Enter text to split (or \"quit\" to exit): ";
        getline(cin, s);
        if(s == "quit") break;
      }
    else
      s = "This is a string of tokens";

    boost::regex re("\\s+");
    boost::sregex_token_iterator i(s.begin(), s.end(), re, -1);
    boost::sregex_token_iterator j;

    unsigned count = 0;
    while(i != j)
      {
        cout << *i++ << endl;
        count++;
      }
    cout << "There were " << count << " tokens found." << endl;

  }while(argc == 1);
  return 0;
}

Answer 1

增强库通常是一个不错的选择，在这种情况下Boost.Regex。甚至有an example用于将字符串拆分为已经完成所需操作的标记。基本上它归结为这样的事情：

boost::regex re("[\\sXY]+");
std::string s;

while (std::getline(std::cin, s)) {
  boost::sregex_token_iterator i(s.begin(), s.end(), re, -1);
  boost::sregex_token_iterator j;
  while (i != j) {
     std::cout << *i++ << " ";
  }
  std::cout << std::endl;
}

Answer 2

查看Boost.Regex。我想你可以在这里找到答案：

C++: what regex library should I use?

Answer 3

如果您想最小化迭代器的使用并简化代码，则以下内容应该有效：

#include <string>
#include <iostream>
#include <boost/regex.hpp>

int main()
{
  const boost::regex re("[\\sXY,]+");

  for (std::string s; std::getline(std::cin, s); ) 
  {
    std::cout << regex_replace(s, re, " ") << std::endl;   
  }

}

Answer 4

与Perl不同，正则表达式不是“内置”到C ++中。

您需要使用外部库，例如PCRE。

Answer 5

正则表达式是Visual C ++ 2008 SP1（包括快速版）和G ++ 4.3中包含的TR1的一部分。

标头是<regex>，命名空间是std :: tr1。适用于STL。

Getting started with C++ TR1 regular expressions

Visual C++ Standard Library : TR1 Regular Expressions

C ++使用正则表达式对字符串进行标记

5 个答案: