Question

我正在学习模板，并希望在此过程中解决以下任务：我想读取一个列类型不同（csv，string等的int文件。），将每一列存储在vector中，然后访问向量。有人可以指出我如何很好地存储列吗？

目前，程序可能遇到的一个示例文件如下：

first_column,second_column
int,string
1, line1
2, line2

csv文件始终在第一行中具有列名，在第二行中具有数据类型，后跟实际数据。但是，潜在的列数不受限制，其“顺序”或“类型”也不受限制。因此，另一个示例可能是

first_column,second_column,third_colum
string, double, string
foo, -19.8, mario
bar, 20.1, anna

基于第二行，程序知道列的数据类型（也知道第一行中的列总数），并且可以分配适当的内存。

我想解决该任务的类的头文件看起来像：

#include <fstream>
#include <string>
#include <vector>

class ColumnarCSV {
   public:
    ColumnarCSV(std::string filename) {read_data(filename);}
    std::vector<std::string> get_names() { return column_names; }
    std::vector<std::string> get_types() { return column_types; }
    // pseudocode
    template <typename T>
    std::vector<T> get_column(std::string column_name) {
        return column;
    }  //
   private:
    void read_data(std::string filename);
    std::vector<std::string> column_names;
    std::vector<std::string> column_types;
    // storage for the columns;
};

类ColumnarCSV由string构成，说明CSV file的位置。两个公共函数提供了以vector<string>编码的列名和列类型。函数get_column需要一个列名并返回其数据。请注意，我不知道如何编写此功能。如果需要，返回类型可以不同。有人知道如何适当地存储列并根据列类型在运行时填充它们吗？

到目前为止我尝试过的事情：

继承：我尝试使用包含列名和数据类型的基类BaseColumn。派生类template <typename T>ActualColumn: public BaseColumn包含实际数据。我想通过虚拟函数访问数据，但得知无法定义虚拟模板函数。
Std：Variant ：我正在考虑使用Std::variant并指定所有可能的列类型。但是，我认为必须有一种方法不诉诸c ++ 17创新。
为所有意外情况创建一个空的vector<vector<T>>：一个蛮力的想法是为ColumnarCSV配备vector<vector<T>>的成员以支持我认为的所有数据类型并在运行时填充它们。在完成工作的同时，代码非常复杂。

是否有更好的方法来解决定义类ColumnarCSV的问题？

Answer 1

我认为您使问题变得过于复杂。当您始终拥有int和string时，您实际上并不需要模板，并且绝对不需要继承或任何形式的类型擦除。如果一行对应于文件中的一个“条目”，那么您只需要一个

struct entry { 
    int id;
    std::string x;
};

和输入运算符

std::istream& operator>>(std::istream& in, entry& e) {
    in >> e.id;
    in >> e.x;
    return in;
}

现在阅读条目非常简单。要阅读一行，请

std::ifstream file("file.name");
entry x;
file >> x;

Answer 2

我认为您可以逐行存储完整的std::string数据。

了解数据的类型，您将可以轻松地将std::string转换为实数类型（std::string，int，double，...）。例如，如果您的std::string实际上是双精度的，则可以使用std::stod进行转换。

我已经举了一个例子来更清楚。考虑以下struct处理数据：

typedef std::vector<std::string> StringVec;

struct FileData
{
    StringVec col_names;
    StringVec type_names;
    StringVec data_lines;

    bool loadData(const std::string & file_path);
    bool getColumn(const std::string & col_name, StringVec & result);
};

typedef只是为了简化代码并使其更具可读性。

方法loadData()将读取文件并将其内容存储在结构中。
col_names是列名称列表，type_names是类型列表，{ {1}}是读取行的列表。

方法data_lines在getColumn()参数中写入result参数中所需列的内容。

这两个方法返回boolean值，它们指示操作是否成功执行（col_name）或是否发生错误（true）。

如果给定的文件无法打开或损坏，

false可能返回false。如果给定的列名不存在，
loadData()可能返回false。

这些方法的可能实现方式可能是：

getColumn()

函数#include <fstream> // ========== ========== ========== ========== ========== StringVec split(const std::string & s, char c) { StringVec splitted; std::string word; for(char ch : s) { if((ch == c) && (!word.empty())) { splitted.push_back(word); word.clear(); } else word += ch; } if(!word.empty()) splitted.push_back(word); return splitted; } void removeExtraSpaces(std::string & word) { while(!word.empty() && (word[0] == ' ')) word.erase(word.begin()); while(!word.empty() && (word[word.size()-1] == ' ')) word.erase(word.end()-1); } // ========== ========== ========== ========== ========== bool FileData::loadData(const std::string & file_path) { bool success(false); std::ifstream in_s(file_path); if(in_s) { bool names_read(false); bool types_read(false); std::string line; while(getline(in_s, line)) { if(!names_read) // first line { col_names = split(line, ','); if(col_names.empty()) return false; // FILE CORRUPTED for(std::string & word : col_names) removeExtraSpaces(word); names_read = true; } else if(!types_read) // second line { type_names = split(line, ','); if(type_names.size() != col_names.size()) { col_names.clear(); type_names.clear(); return false; // FILE CORRUPTED } for(std::string & word : type_names) removeExtraSpaces(word); types_read = true; } else // other lines { if(split(line, ',').size() != col_names.size()) { col_names.clear(); type_names.clear(); data_lines.clear(); return false; // FILE CORRUPTED } data_lines.push_back(line); } } in_s.close(); success = true; } return success; } bool FileData::getColumn(const std::string & col_name, StringVec & result) { bool success(false); bool contains(false); size_t index(0); while(!contains && (index < col_names.size())) { if(col_names[index] == col_name) contains = true; else ++index; } if(contains) { for(const std::string & line : data_lines) { std::string field(split(line, ',').at(index)); removeExtraSpaces(field); result.push_back(field); } success = true; } return success; } // ========== ========== ========== ========== ==========和split()的定义是为了简化代码（并使该示例更具可读性）。

从用户的角度来看，它可以按以下方式使用：

removeExtraSpaces()

您可以看到，它非常易于使用:)
我知道这时您有一个DataFile df; bool loadSuccessful = df.loadData("data.txt"); // if true, df contains now the content of the file. StringVec col; bool columnFound = df.getColumn("col_name", col); // if true, col contains now the content of the desired column.向量，但是由于该结构包含每个列的实类型的名称，可以将您得到的转换为实型。
也许您可以在结构中添加模板化的std::string方法，以使用户无法察觉。

我已经使用以下数据文件进行了测试：

data.txt：

convert()

_other_data.txt：_

first_col, second_col
string, double
line1, 1.1
line2, -2.5
line3, 10.03

它对两者都成功地工作。

我不知道以first_col, second_col, third_col int, string, char 0, line1, a 5, line2, b的格式对您来说是否足够优雅，但我希望它能为您提供帮助。

如何（优雅地）读取列类型不同的文件并适当地存储列？

2 个答案: