Question

我想逐字逐句地阅读文本，以简单的方式避免使用任何非字母数字字符。在＆＃39;演变之后从带有空格和＆＃39; \ n＆＃39;的文本中，我需要解决这个问题，以防有＆＃39;，＆＃39;，＆＃39;。＆＃39;例如。第一个案例是通过使用带有分隔符的getline来解决的。＆＃39 ;. 我想知道是否可以使用getline多个分隔符，甚至使用某种正则表达式（例如'.'|' '|','|'\n'）。

据我所知，getline的工作方式是从输入流中读取字符，直到＆＃39; \ n＆＃39;或达到delimiter个字符。我的第一个猜测是，为它提供多个分隔符非常简单，但我发现它不是。

编辑：正如澄清一样。任何C风格（strtok，例如，我认为非常难看）或算法类型的解决方案并不是我想要的。很容易想出一个简单的算法来解决这个问题并实现它。我正在寻找一个更优雅的解决方案，或至少解释为什么我们无法使用getline函数处理它，因为除非我完全被误解，否则应该能够以某种方式接受超过一个分隔符。

Answer 1

有好消息和坏消息。好消息是你可以做到这一点。

坏消息是这样做是相当迂回的，有些人发现它非常丑陋和讨厌。

要做到这一点，你首先要观察两个事实：

普通的字符串提取器使用空格来分隔“单词”。
在流的区域设置中定义了什么构成空格。

将这些放在一起，答案变得相当明显（如果迂回）：要定义多个分隔符，我们定义一个区域设置，允许我们指定应将哪些字符视为分隔符（即空格）：

struct word_reader : std::ctype<char> {
    word_reader(std::string const &delims) : std::ctype<char>(get_table(delims)) {}
    static std::ctype_base::mask const* get_table(std::string const &delims) {
        static std::vector<std::ctype_base::mask> rc(table_size, std::ctype_base::mask());

        for (char ch : delims)
            rc[ch] = std::ctype_base::space;
        return &rc[0];
    }
};

然后我们需要告诉流使用该语言环境（具有该ctype facet的语言环境），传递我们想要用作分隔符的字符，然后从流中提取单词：

int main() {
    std::istringstream in("word1, word2. word3,word4");

    // create a ctype facet specifying delimiters, and tell stream to use it:
    in.imbue(std::locale(std::locale(), new word_reader(" ,.\n")));
    std::string word;

    // read words from the stream. Note we just use `>>`, not `std::getline`:
    while (in >> word)
        std::cout << word << "\n";
}

结果就是（我希望）你想要的：提取每个单词而没有我们称之为“白色空间”的标点符号。

word1
word2
word3
word4

getline函数的多个分隔符，c ++

1 个答案: