Question

如果我的文件中填充了逗号分隔值，例如：

"myComputer",5,192.168.1.0,25
"herComputer",6,192.168.1.1,26
"hisComputer",7,192.168.1.2,27

我想把数据拉成字符串，我会做这样的事情：

std::string line;
std::ifstream myfile ("myCSVFile.txt");

if(myfile.is_open())
{
    while(getline(myfile,line))
    {
        std::string tempString = line;
        std::string delimiter = ",";
    }
}

为了自己解析每个值，我使用类似这样的东西：Parse (split) a string in C++ using string delimiter (standard C++)

std::string s = "scott>=tiger>=mushroom";
std::string delimiter = ">=";

size_t pos = 0;
std::string token;
while ((pos = s.find(delimiter)) != std::string::npos) {
    token = s.substr(0, pos);
    std::cout << token << std::endl;
    s.erase(0, pos + delimiter.length());
}
std::cout << s << std::endl;

问题是，如果我只想要，第一个和第三个值怎么办？所以，如果我想从上面看我的csv文件，只输出

"myComputer" 192.168.1.0
"herComputer" 192.168.1.1
"hisComputer" 192.168.1.2

有没有办法使用上述方法实现这一目标，还是应该使用完全不同的方法？谢谢，

Answer 1

使用专用库来完成此任务要容易得多。使用Boost Tokenizer's Escaped List Separator，它变得轻而易举：

#include <vector>
#include <string>
#include <iostream>
#include <fstream>
#include <boost/tokenizer.hpp>

int main()
{
    std::ifstream myfile("myCSVFile.txt");

    if (myfile.is_open())
    {
        std::string line;
        while (std::getline(myfile, line))
        {
            typedef boost::escaped_list_separator<char> Separator;
            typedef boost::tokenizer<Separator> Tokenizer;

            std::vector<std::string> tokens;
            Tokenizer tokenizer(line);
            for (Tokenizer::iterator iter = tokenizer.begin(); iter != tokenizer.end(); ++iter)
            {
               tokens.push_back(*iter);
            }

            if (tokens.size() == 4)
            {
                std::cout << tokens[0] << "\t" << tokens[2] << "\n";
            }
            else
            {
                std::cerr << "illegal line\n";
            }
        }
    }
}

请注意，在C ++ 11中，您可以简化循环：

for (auto &token : tokenizer)
{
    tokens.push_back(token);
}

正如您所看到的，我们的想法是将所有行的值存储在std::vector中，然后输出所需的内容。

如果你真的处理大文件，现在这个可能会导致性能问题。在这种情况下，请将计数器与标记器一起使用：

#include <vector>
#include <string>
#include <iostream>
#include <fstream>
#include <boost/tokenizer.hpp>

int main()
{
    std::ifstream myfile("myCSVFile.txt");

    if (myfile.is_open())
    {
        std::string line;
        while (std::getline(myfile, line))
        {
            typedef boost::escaped_list_separator<char> Separator;
            typedef boost::tokenizer<Separator> Tokenizer;

            Tokenizer tokenizer(line);
            int count = 0;
            for (Tokenizer::iterator iter = tokenizer.begin(); (iter != tokenizer.end()) && (count < 3); ++iter)
            {
                if ((count == 0) || (count == 2))
                {
                    std::cout << *iter;
                    if (count == 0)
                    {
                        std::cout << "\t";
                    }
                }
                ++count;
            }
            std::cout << "\n";
        }
    }
}

即使使用自制的字符串拆分算法，您也可以使用这两种技术（std::vector<std::string>以及稍后输出或循环计数器）。基本思路是一样的：

使用std::vector<std::string>：

std::vector<std::string> tokens;
while ((pos = s.find(delimiter)) != std::string::npos) {
    token = s.substr(0, pos);
    tokens.push_back(token);
    s.erase(0, pos + delimiter.length());
}

if (tokens.size() == 4)
{
    std::cout << tokens[0] << "\t" << tokens[2] << "\n";
}
else
{
    std::cerr << "illegal line\n";
}

带一个柜台：

int count = 0;
while ((pos = s.find(delimiter)) != std::string::npos && (count < 4)) {
    token = s.substr(0, pos);

    if ((count == 0) || (count == 2))
    {
        std::cout << token;
        if (count == 0)
        {
            std::cout << "\t";
        }
    }
    ++count;
    s.erase(0, pos + delimiter.length());
}

Answer 2

正如上面的评论所示，答案是只输出我想要的列。我通过添加一个带有实际打印循环的计数器来实现它。而不是一会儿，我可以轻松地将计数器添加到for循环中，但我没有。

std::string s = "scott>=tiger>=mushroom";
std::string delimiter = ">=";

size_t pos = 0;
std::string token;

int counter = 0;
while ((pos = s.find(delimiter)) != std::string::npos) 
{
    token = s.substr(0, pos);

    if(counter == 0 || counter == 2)
    {
        std::cout << token << std::endl;
    }


    s.erase(0, pos + delimiter.length());
}

奇怪的是，我正在考虑这个问题不正确，以及非常简单的评论，＆＃34;只打印你想要的那些＆＃34;实际上帮了。谢谢

使用令牌仅解析来自csv文件的特定列

2 个答案: