Question

C ++标准库支持几种为输入流引入自定义分隔符的方法，据我所知，推荐的方法是使用新的语言环境和ctype对象：

第一种方式（继承自ctype专精化）：

struct csv_whitespace : std::ctype<char>
{
    bool do_is(mask m, char_type c) const
    {
        if ((m & space) && c == ' ') {
            return false; // space will NOT be classified as whitespace
        }
        if ((m & space) && c == ',') {
            return true; // comma will be classified as whitespace
        }
        return ctype::do_is(m, c); // leave the rest to the parent class
    }
};
//  for cin stream :
cin.imbue(std::locale(cin.getloc(), new csv_whitespace));

第二种方式（参数化ctype专门化）：

//  getting existing table for ctype<char> specialization
const auto temp = std::ctype<char>::classic_table();
//  create a copy of the table in vector container
std::vector<std::ctype<char>::mask> new_table_vector(temp, temp + std::ctype<char>::table_size);

//  add/remove stream separators using bitwise arithmetic.
//  use char-based indices because ascii codes here are equal to indices
new_table_vector[' '] ^= ctype_base::space;
new_table_vector['\t'] &= ~(ctype_base::space | ctype_base::cntrl);
new_table_vector[':'] |= ctype_base::space;
//  A ctype initialized with new_table_vector would delimit on '\n' and ':' but not ' ' or '\t'.

//  ....
//  usage of the mask above.
cin.imbue(locale(cin.getloc(), new std::ctype<char>(new_table_vector.data())));

但是有没有办法将分隔符包含在结果标记中？ e.g。

AAA＆安培; BBB CCC *％DDD＆安培; EEE

其中

＆安培; *％

是使用上述方法之一定义的分隔符。结果字符串将是：

AAA

＆安培; BBB

* CCC

％DDD

＆安培; EEE

所以你看 - 分隔符包含在结果字符串中。这是一个问题 - 如何配置（并且可能？）输入流？

谢谢

Answer 1

简短回答是否，istream s不提供用于提取和保留分隔符的inate方法。 istream提供以下提取方法：

operator>> - 丢弃分隔符
get - 根本不提取 a 分隔符
getline - 放弃 a 分隔符
read - 不尊重分隔符
readsome - 不尊重分隔符

但是，我们假设您将istream篡改为string foo，然后您可以使用这样的正则表达式进行标记：

((?:^|[&*%])[^&*%]*)

Live Example

这可以与regex_token_iterator这样使用：

const regex re{ "((?:^|[&*%])[^&*%]*)" };
const vector<string> bar{ sregex_token_iterator(cbegin(foo), cend(foo), re, 1), sregex_token_iterator() };

Live Example

如何将C ++输入流分隔符包含到结果标记中

1 个答案: