Question

我想知道如何在getline功能中使用2个或更多分隔符，这就是我的问题：

该程序读取一个文本文件......每一行都按照以下方式进行操作：

   New Your, Paris, 100
   CityA, CityB, 200

我正在使用getline（文件，行），但是当我想要获得CityA，然后是CityB然后是数字时，我得到了整条线;如果我使用＆＃39;，＆＃39;分隔符，我不知道下一行是什么时候，所以我试图找出一些解决方案..

虽然，我怎么能用逗号和\ n作为分隔符？顺便说一句，我操纵字符串类型，而不是char，所以strtok是不可能的：/

有些划伤：

string line;
ifstream file("text.txt");
if(file.is_open())
   while(!file.eof()){
     getline(file, line);
        // here I need to get each string before comma and \n
   }

Answer 1

您可以使用std::getline读取一行，然后将该行传递给std::stringstream并读取逗号分隔值

string line;
ifstream file("text.txt");
if(file.is_open()){
   while(getline(file, line)){   // get a whole line
       std::stringstream ss(line);
        while(getline(ss, line, ',')){
             // You now have separate entites here
        }
   }

Answer 2

不，std::getline（）只接受单个字符，以覆盖默认分隔符。 std::getline()没有多个备用分隔符的选项。

解析此类输入的正确方法是使用默认的std::getline（）将整行读入std::string，然后构造std::istringstream，然后解析它进一步，用逗号分隔值。

但是，如果您真正解析以逗号分隔的值，则应该使用a proper CSV parser。

Answer 3

通常，以分层，树状方式解析字符输入更直观，更有效，首先将字符串拆分为主要块，然后继续处理每个块，将它们分成较小的部分，等等。

另一种方法是像输入一样对strtok进行标记，一次处理一个标记，直到遇到输入结束。在解析简单输入时，这可能是首选，因为它很容易实现。在使用嵌套结构解析输入时也可以使用此样式，但这需要维护某种上下文信息，这些信息可能会变得过于复杂而无法在单个函数或有限的代码区域内维护。

依赖于C ++ std库的人通常最终使用std::stringstream和std::getline来标记字符串输入。但是，这只给你一个分隔符。他们永远不会考虑使用strtok，因为它是C运行时库中的一个不可重入的垃圾。因此，他们最终使用流，并且只有一个分隔符，一个人有义务使用分层解析风格。

但是zneak提出std::string::find_first_of，它接受一组字符并返回最接近包含该字符集的字符串开头的位置。还有其他成员函数：find_last_of，find_first_not_of等等，它们似乎仅用于解析字符串。但是std::string没有提供有用的标记化功能。

另一个选项是<regex>库，它可以执行您想要的任何操作，但它是新的，您需要习惯其语法。

但是，只需很少的努力，您就可以利用std::string中的现有函数来执行标记化任务，而无需借助流。这是一个简单的例子。 get_to()是标记化功能，tokenize演示了如何使用它。

此示例中的代码将慢于strtok，因为它会不断地从正在解析的字符串的开头删除字符，并且还复制并返回子字符串。这使得代码易于理解，但这并不意味着更有效的标记化是不可能的。它甚至不会比这复杂得多 - 您只需跟踪当前位置，将其用作start成员函数中的std::string参数，并且永远不会更改源串。毫无疑问，甚至还有更好的技术。

要理解示例的代码，请从底部开始，main()所在的位置以及您可以在哪里查看函数的使用方式。这段代码的顶部由基本的实用函数和愚蠢的注释支配。

#include <iostream>
#include <string>
#include <utility>

namespace string_parsing {
// in-place trim whitespace off ends of a std::string
inline void trim(std::string &str) {
    auto space_is_it = [] (char c) {
        // A few asks:
        // * Suppress criticism WRT localization concerns
        // * Avoid jumping to conclusions! And seeing monsters everywhere! 
        //   Things like...ah! Believing "thoughts" that assumptions were made
        //   regarding character encoding.
        // * If an obvious, portable alternative exists within the C++ Standard Library,
        //   you will see it in 2.0, so no new defect tickets, please.
        // * Go ahead and ignore the rumor that using lambdas just to get 
        //   local function definitions is "cheap" or "dumb" or "ignorant."
        //   That's the latest round of FUD from...*mumble*.
        return c > '\0' && c <= ' '; 
    };

    for(auto rit = str.rbegin(); rit != str.rend(); ++rit) {
        if(!space_is_it(*rit)) {
            if(rit != str.rbegin()) {
                str.erase(&*rit - &*str.begin() + 1);
            }
            for(auto fit=str.begin(); fit != str.end(); ++fit) {
                if(!space_is_it(*fit)) {
                    if(fit != str.begin()) {
                        str.erase(str.begin(), fit);
                    }
                    return;
    }   }   }   }
    str.clear();
}

// get_to(string, <delimiter set> [, delimiter])
// The input+output argument "string" is searched for the first occurance of one 
// from a set of delimiters.  All characters to the left of, and the delimiter itself
// are deleted in-place, and the substring which was to the left of the delimiter is
// returned, with whitespace trimmed.
// <delimiter set> is forwarded to std::string::find_first_of, so its type may match
// whatever this function's overloads accept, but this is usually expressed
// as a string literal: ", \n" matches commas, spaces and linefeeds.
// The optional output argument "found_delimiter" receives the delimiter character just found.
template <typename D>
inline std::string get_to(std::string& str, D&& delimiters, char& found_delimiter) {
    const auto pos = str.find_first_of(std::forward<D>(delimiters));
    if(pos == std::string::npos) {
        // When none of the delimiters are present,
        // clear the string and return its last value.
        // This effectively makes the end of a string an
        // implied delimiter.
        // This behavior is convenient for parsers which
        // consume chunks of a string, looping until
        // the string is empty.
        // Without this feature, it would be possible to 
        // continue looping forever, when an iteration 
        // leaves the string unchanged, usually caused by
        // a syntax error in the source string.
        // So the implied end-of-string delimiter takes
        // away the caller's burden of anticipating and 
        // handling the range of possible errors.
        found_delimiter = '\0';
        std::string result;
        std::swap(result, str);
        trim(result);
        return result;
    }
    found_delimiter = str[pos];
    auto left = str.substr(0, pos);
    trim(left);
    str.erase(0, pos + 1);
    return left;
}

template <typename D>
inline std::string get_to(std::string& str, D&& delimiters) {
    char discarded_delimiter;
    return get_to(str, std::forward<D>(delimiters), discarded_delimiter);
}

inline std::string pad_right(const std::string&     str,
                             std::string::size_type min_length,
                             char                   pad_char=' ')
{
    if(str.length() >= min_length ) return str;
    return str + std::string(min_length - str.length(), pad_char);
}

inline void tokenize(std::string source) {
    std::cout << source << "\n\n";
    bool quote_opened = false;
    while(!source.empty()) {
        // If we just encountered an open-quote, only include the quote character
        // in the delimiter set, so that a quoted token may contain any of the
        // other delimiters.
        const char* delimiter_set = quote_opened ? "'" : ",'{}";
        char delimiter;
        auto token = get_to(source, delimiter_set, delimiter);
        quote_opened = delimiter == '\'' && !quote_opened;
        std::cout << "    " << pad_right('[' + token + ']', 16) 
            << "   " << delimiter << '\n';
    }
    std::cout << '\n';
}
}

int main() {
    string_parsing::tokenize("{1.5, null, 88, 'hi, {there}!'}");
}

输出：

{1.5, null, 88, 'hi, {there}!'}

    []                 {
    [1.5]              ,
    [null]             ,
    [88]               ,
    []                 '
    [hi, {there}!]     '
    []                 }

Answer 4

我认为你不应该如何解决这个问题（即使你能做到这一点）;代替：

使用您在每行中阅读的内容
然后用逗号分隔该行以获得您想要的部分。

如果strtok将执行＃2的工作，您始终可以将字符串转换为字符数组。

我可以在C ++函数getline中使用2个或更多分隔符吗？

4 个答案: