C++ High Performance File Reading and Writing (C++14)

时间:2016-07-28 20:23:31

标签: c++ io

I’m writing a C++14 program to load text strings from a file, do some computation on them, and write back to another file. I’m using Linux, and the files are relatively large (O(10^6 lines)). My typical approach to this is to use the old C getline and sscanf utilities to read and parse the input, and fprintf(FILE*, …) to write the output files. This works, but I’m wondering if there’s a better way with the goals of high performance and generally recommended approach with the modern C++ standard that I’m using. I’ve heard that iostream is quite slow; if that’s true, I’m wondering if there’s a more recommended approach.

Update: To clarify a bit on the use case: for each line of the input file, I'll be doing some text manipulation (data cleanup, etc.). Each line is independent. So, loading the entire input file (or, at least large chunks of it), and processing it line by line, and then writing it, seems to make the most sense. The ideal abstraction for this would be to get an iterator to the read-in buffer, with each line being an entry. Is there a recommended way to do that with std::ifstream?

3 个答案:

答案 0 :(得分:5)

如果你有足够的内存,最快的选择是将整个文件读入缓冲区,读取1次,在内存中处理缓冲区,然后用1次写入再写出来。

全部阅读:

std::string buffer;

std::ifstream f("file.txt");
f.seekg(0, std::ios::end);
buffer.resize(f.tellg());
f.seekg(0);
f.read(buffer.data(), buffer.size());

然后处理它

然后全部写下来:

std::ofstream f("file.txt");
f.write(buffer.data(), buffer.size());

答案 1 :(得分:1)

我认为您可以并行读取文件,创建n个线程,每个线程使用david方法各自拥有自己的偏移量,然后将数据拉入单独的区域,然后将其映射到单个位置。查看ROMIO有关如何最大化速度的想法。 ROMIO的想法可以在std c ++中完成而不会有太多麻烦。

答案 2 :(得分:0)

如果您有C ++ 17(std :: filesystem),也有这种方式(通过std :: filesystem :: file_size而不是seekg和tellg获取文件的大小)。我认为这可以让你避免阅读两次

它显示在this answer