的正则表达式

Question

我有一个文本文件，其中包含一组以下列方式格式化的名称：

"MARY","PATRICIA","LINDA","BARBARA","ELIZABETH"

等等。我想使用 ifstream 打开文件，并将名称读入字符串数组（不带引号，逗号）。我以某种方式设法通过逐个字符检查输入流来完成它。是否有更简单的方法来获取此格式化的输入？

编辑：我听说你可以用类似的东西 fscanf（f，“\”％[a-zA-Z] \“，”，str）; 在C中，但ifstream有这样的方法吗？

Answer 1

该输入应该可以使用std::getline或std::regex_token_iterator进行解析（尽管后者是用炮兵射击麻雀）。

示例：

的正则表达式

快速而肮脏，但重量级的解决方案（使用提升，因此大多数编译器都会这样做）

#include <boost/regex.hpp>
#include <iostream>

int main() {
    const std::string s = "\"MARY\",\"PATRICIA\",\"LINDA\",\"BARBARA\",\"ELIZABETH\"";

    boost::regex re("\"(.*?)\"");
    for (boost::sregex_token_iterator it(s.begin(), s.end(), re, 1), end; 
         it != end; ++it)
    {
        std::cout << *it << std::endl;
    }
}

输出：

MARY
PATRICIA
LINDA
BARBARA
ELIZABETH

或者，您可以使用

boost::regex re(",");
for (boost::sregex_token_iterator it(s.begin(), s.end(), re, -1), end;

让它以逗号分隔（注意-1）或其他正则表达式。

函数getline

getline解决方案（允许空格）

#include <sstream>
#include <iostream>

int main() {
    std::stringstream ss;
    ss.str ("\"MARY\",\"PATRICIA\",\"LINDA\",\"BARBARA\",\"ELIZABETH\"");

    std::string curr;
    while (std::getline (ss, curr, ',')) {
        size_t from = 1 + curr.find_first_of ('"'),
               to   =     curr.find_last_of ('"');
        std::cout << curr.substr (from, to-from) << std::endl;
    }
}

输出相同。

函数getline

getline解决方案（不允许空格）

循环变得几乎无足轻重：

    std::string curr;
    while (std::getline (ss, curr, ',')) {
        std::cout << curr.substr (1, curr.length()-2) << std::endl;
    }

自制软件

浪费最少w.r.t.性能（特别是当你不存储这些字符串，而是存储迭代器或索引时）

#include <iostream>

int main() {
    const std::string str ("\"MARY\",\"PATRICIA\",\"LINDA\",\"BARBARA\",\"ELIZABETH\"");        

    size_t i = 0;
    while (i != std::string::npos) {
        size_t begin  = str.find ('"', i) + 1, // one behind initial '"'
               end    = str.find ('"', begin),
               comma  = str.find (',', end);
        i = comma;

        std::cout << str.substr(begin, end-begin) << std::endl;
    }
}

Answer 2

据我所知，STL中没有标记器。但是如果你愿意使用助推器，那里有一个非常好的tokenizer课程。除此之外，逐个字符是解决它的最佳C ++方式（除非您愿意使用C路由，并在原始char *字符串上使用strtok_t）。

Answer 3

一个简单的标记器应该可以做到;不需要像正则表达式那样重量级的东西。 C ++没有内置的，但写起来很容易。这是我自己在很久以前偷走了互联网的一个，我甚至不记得是谁写的，所以对于公然的剽窃道歉：

#include <vector>
#include <string>

std::vector<std::string>
tokenize(const std::string & str, const std::string & delimiters)
{
  std::vector<std::string> tokens;

  // Skip delimiters at beginning.
  std::string::size_type lastPos = str.find_first_not_of(delimiters, 0);

  // Find first "non-delimiter".
  std::string::size_type pos     = str.find_first_of(delimiters, lastPos);

  while (std::string::npos != pos || std::string::npos != lastPos)
  {
    // Found a token, add it to the vector.
    tokens.push_back(str.substr(lastPos, pos - lastPos));

    // Skip delimiters.  Note the "not_of"
    lastPos = str.find_first_not_of(delimiters, pos);

    // Find next "non-delimiter"
    pos = str.find_first_of(delimiters, lastPos);
  }

  return tokens;
}

用法：std::vector<std::string> words = tokenize(line, ",");

Answer 4

实际上，因为我感兴趣，所以我使用Boost.Spirit.Qi来确定如何执行此操作：

#include <boost/spirit/include/qi.hpp>
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>

using namespace boost::spirit::qi;

int main() {
  // our test-string
  std::string data("\"MARY\",\"PATRICIA\",\"LINDA\",\"BARBARA\"");
  // this is where we will store the names
  std::vector<std::string> names;
  // parse the string
  phrase_parse(data.begin(), data.end(), 
           ( lexeme['"' >> +(char_ - '"') >> '"'] % ',' ),
           space, names);
  // print what we have parsed
  std::copy(names.begin(), names.end(), 
            std::ostream_iterator<std::string>(std::cout, "\n"));
}

要检查解析过程中是否发生错误，只需将迭代器存储在变量中的字符串上，然后再进行比较。如果它们相等，则匹配整个字符串，否则，begin-iterator将指向错误站点。

如何从ifstream获取格式化输入

4 个答案:

的正则表达式

函数getline

函数getline

自制软件