Question

我必须将一个文本文件读入一个结构数组。我已经编写了一个程序，但由于文件中有大约13个lac结构，所以需要花费太多时间。请用C ++建议我最好和最快的方法。

这是我的代码：

std::ifstream input_counter("D:\\cont.txt");

/**********************************************************/
int counter = 0;
while( getline(input_counter,line) )
{
    ReadCont( line,&contract[counter]); // function to read data to structure
    counter++;
    line.clear();
}
input_counter.close();

Answer 1

在这种情况下，我会完全使用Qt。

struct MyStruct {
    int Col1;
    int Col2;
    int Col3;
    int Col4;
    // blabla ...
};

QByteArray Data;
QFile f("D:\\cont.txt");
if (f.open(QIODevice::ReadOnly)) {
    Data = f.readAll();
    f.close();
}

MyStruct* DataPointer = reinterpret_cast<MyStruct*>(Data.data());
// Accessing data
DataPointer[0] = ...
DataPointer[1] = ...

现在您拥有了数据，您可以将其作为数组访问。

如果您的数据不是二进制文件而您必须先解析它，则需要转换例程。例如，如果您读取包含4列的csv文件：

QVector<MyStruct> MyArray;
QString StringData(Data);
QStringList Lines = StringData.split("\n"); // or whatever new line character is
for (int i = 0; i < Lines.count(); i++) {
    String Line = Lines.at(i);
    QStringList Parts = Line.split("\t"); // or whatever separator character is
    if (Parts.count() >= 4) {
        MyStruct t;
        t.Col1 = Parts.at(0).toInt();
        t.Col2 = Parts.at(1).toInt();
        t.Col3 = Parts.at(2).toInt();
        t.Col4 = Parts.at(3).toInt();
        MyArray.append(t);
    } else { 
        // Malformed input, do something
    }
}

现在您的数据已经解析并位于MyArray向量中。

Answer 2

让你的'解析'尽可能简单：你知道字段'格式应用知识，例如

ReadCont("|PE|1|0|0|0|0|1|1||2|0||2|0||3|0|....", ...)

应该将快速char应用于整数转换，例如

ReadCont(const char *line, Contract &c) {
   if (line[1] == 'P' && line[2] == 'E' && line[3] == '|') {
     line += 4;
     for (int field = 0; field < K_FIELDS_PE; ++field) {
       c.int_field[field] = *line++ - '0';
       assert(*line == '|');
       ++line;
     }
   }

嗯，要注意细节，但你明白了......

Answer 3

正如user2617519所说，多线程可以更快地实现这一点。我看到你正在阅读每一行并解析它。将这些行放入队列中。然后让不同的线程将它们从队列中弹出并将数据解析为结构一种更简单的方法（没有多线程的复杂性）是将输入数据文件拆分成多个文件并运行相同数量的进程来解析它们。然后可以稍后合并数据。

Answer 4

QFile::readAll()可能会导致内存问题，std::getline()速度很慢（::fgets()也是如此）。

我遇到了类似的问题，我需要在QTableView中解析非常大的分隔文本文件。使用自定义模型，我解析文件以找到每行开头的偏移量。然后，当需要在表格中显示数据时，我会读取该行并按需解析它。这导致了大量的解析，但实际上足够快，不会注意到滚动或更新速度的任何延迟。

它还具有内存使用率低的额外好处，因为我没有将文件内容读入内存。使用此策略几乎可以使用任何大小的文件。

解析代码：

m_fp = ::fopen(path.c_str(), "rb"); // open in binary mode for faster parsing
if (m_fp != NULL)
{
  // read the file to get the row pointers
  char buf[BUF_SIZE+1];

  long pos = 0;
  m_data.push_back(RowData(pos));
  int nr = 0;
  while ((nr = ::fread(buf, 1, BUF_SIZE, m_fp)))
  {
    buf[nr] = 0; // null-terminate the last line of data
    // find new lines in the buffer
    char *c = buf;
    while ((c = ::strchr(c, '\n')) != NULL)
    {
      m_data.push_back(RowData(pos + c-buf+1));
      c++;
    }
    pos += nr;
  }

  // squeeze any extra memory not needed in the collection
  m_data.squeeze();
}

RowData和m_data特定于我的实现，但它们仅用于缓存文件中某行的信息（例如文件位置和列数）。

我采用的另一个效果策略是使用QByteArray来解析每一行，而不是QString。除非你需要unicode数据，否则这将节省时间和内存：

// optimized line reading procedure
QByteArray str;
char buf[BUF_SIZE+1];
::fseek(m_fp, rd.offset, SEEK_SET);
int nr = 0;
while ((nr = ::fread(buf, 1, BUF_SIZE, m_fp)))
{
  buf[nr] = 0; // null-terminate the string
  // find new lines in the buffer
  char *c = ::strchr(buf, '\n');
  if (c != NULL)
  {
    *c = 0;
    str += buf;
    break;
  }
  str += buf;
}

return str.split(',');

如果您需要使用字符串而不是单个字符拆分每一行，请使用::strtok()。

从大文本文件读入Qt中的结构数组？

4 个答案: