Question

基本上我有一个包含8列的数据文件，我想将每列中的每个值放入一个数组变量中。但问题是缺少一些价值观。例如

100 54201.10 49392 9379101 10381.1372
101 5823829        73929   83729.77

缺失值由额外的制表符或/ t空格分隔。如何读取值，忽略丢失的数据并将正确的值输入到正确的变量中？

我尝试使用：

infile >> network;
    string val = isNaN(network);
    if (count % 8 == 0) { ID[count / 8] = val; }
    if (count % 8 == 1) { time[count / 8] = val; }
    if (count % 8 == 2) { country_code[count / 8] = val; }
    if (count % 8 == 3) { sms_in[count / 8] = val; }
    if (count % 8 == 4) { sms_out[count / 8] = val; }
    if (count % 8 == 5) { call_in[count / 8] = val; }
    if (count % 8 == 6) { call_out[count / 8] = val; }
    if (count % 8 == 7) { internet[count / 8] = val; }
    count++;

Answer 1

在C ++中执行此操作的一个好方法是使用getline来获取每一行。

#include <string>
#include <vector>
...
typedef struct {
    unsigned long id;
    unsigned long timestamp;
    ...
} Record;
std::vector<Record> records;
while (std::getline(std::cin, s)) {
    ...

然后使用substr填充记录集合。假设您的字段以制表符分隔并且数字左对齐，则可以像这样处理违约。

posTab = s.find_first_of('\t');
records[i].id = posTab == 0
    ? defaultID
    : std::atoi(s.substr(0, posTab).c_str());

索引i是记录索引，从0开始。对于float和double精度浮点数，您需要用适当的标准数值解析器替换std :: atoi。

如果数据以制表符分隔，则对于每个字段（每个记录），使用find_first_of（posTab + 1，＆＃39; / t＆＃39;）查找每个字段的开头，从前一个位置开始你可以保存在posPreviousTab中用作相等测试和第一个substr参数而不是零。

备注

对于大型数据集，在某些条件下，std :: list比std :: vector快。您可以编写一个测试来比较您的案例的两个选项。

如果你正在处理大数据，你可能需要更高的速度，使用char [MAXSIZE]和C中的等效算法并在运行中处理，而不是将每条记录存储在内存中。

从缺少列的文件中读取数据。 C ++

1 个答案: