Question

我想在内存中读取大约5MB的文件... 该文件具有此格式（它是文本文件）

ID 3:  0 itemId.1 0 itemId.2 0 itemId.5 1 itemId.7 ........................ 20 itemId.500
ID 50:  0 itemId.31 0 itemId.2 0 itemId.4 2 itemId.70 ........................ 20 itemId.2120
.....

如何在c ++中有效地完成这项工作？

Answer 1

逐行读取文件：

ifstream fin ("file.txt");
string     myStr;

while(getline(fin, myStr))   // Always put the read in the while condition.
{                            // Then you only enter the loop if there is data to
    //use myStr data         // processes. Otherwise you need to read and then
}                            //  test if the read was OK
                             //
                             // Note: The last line read will read up to (but not
                             //        past) then end of file. Thus When there is
                             //        no data left in the file its state is still
                             //        OK. It is not until you try and explicitly
                             //        read past the end of file that EOF flag is set.

由于没有明确呼叫关闭的原因，请参阅：
https://codereview.stackexchange.com/questions/540/my-c-code-involving-an-fstream-failed-review/544#544

如果效率是你的主要目标（可能不是）。然后将整个文件读入内存并从那里解析：请参阅下面的Thomas：Read large txt file in c++

Answer 2

将整个文件读入内存，然后处理内存中的内容。

当电机保持旋转时，文件资源（例如硬盘驱动器）最有效。因此，一次大数据读取比少量数据的5次读取更有效。

在大多数平台上，内存访问速度比文件快。使用此信息，可以通过将数据读入内存然后处理内存来提高程序的效率。

结合这两种技术将产生更好的性能：在一个事务中读取尽可能多的数据到内存中然后处理内存。

有些人声明了char或unsigned char的大数组（对于二进制数据）。其他人告诉std :: string或std :: vector保留大量内存，然后将数据读入数据结构。

此外，块读取（a.ka。istream::read()）将绕过C ++流设施的大部分缓慢部分。

Answer 3

使用file stream：

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main() {
    string line;
    ifstream myfile ("example.txt");
    if (myfile.is_open())
    {
        while ( getline(myfile, line) )
            cout << line << endl;

        myfile.close();
    }
    else 
    {
        cout << "Unable to open file"; 
    }

    return 0;
}

5MB真的不是一个大文件。该流将一次为您处理读取块，但实际上;几乎所有运行的机器都可以将5MB读入内存，没问题。

用c ++读取大文本文件

3 个答案: