Is memory mapped i/o worthwhile for sequential processing?

Consider a program that is going to process an entire input file sequentially in one pass. Is there any advantage to mapping the file into memory versus reading it into a buffer for processing?

  • I understand that if you were going to access only portions of the file, then memory mapped i/o can save disk accesses for the portions of the file not needed. But I'm interested in one sequential pass of the entire file.

  • If you were going to read the file (or at least portions of it) multiple times, it might be faster to let the virtual memory system figure out which parts to keep in cache. But, again, one sequential pass over the entire file won't benefit from this.

  • I know that high level i/o (e.g., C++ i/o streams or C functions like fscanf) introduce layers of buffering and abstraction on top of the OS's fundamental read operation. Let's avoid the language's standard library and focus on the OS call (i.e., ReadFile on Windows or read() on Linux).

It seems to me the bottleneck (reading the data from the disc) is the same with either approach, yet I hear people claim that memory mapping has less overhead, even in the case of one sequential pass over the entire file.

I'll concede that if two programs are trying to read the same file via memory mapping, then the second can map the same physical pages into its own address space, avoiding the actual disk reads. Are there any other advantages?

I'm interested primarily in Windows, but bonus points if you can also point out any significant differences with respect to Linux.

我通过修改读取文本文件并将单词放入TRIE的程序在Windows上进行了实验。为了关注I / O性能,我注释掉了实际的TRIE操作,因此该程序只读取文本并将其分解为单词。


C++ iostreams    228 ms (σ =  6)
Win32 ReadFile   115 ms (σ =  8)
memory mapped    136 ms (σ = 14)


结果证实了我的怀疑,对于单次连续传递,内存映射相对于Win32 ReadFile没有实质性的优势。实际上,可能会有一个小的损失(更多的系统调用?)和更多的变化。


不足为奇的是,C ++ iostreams库中额外的缓冲层使其成为最慢的方法。





用C ++编码,使用/EHsc /O2 /std:c++latest通过MSVC 2019编译成64位可执行文件。在具有SSD的基于Intel的台式机上执行。


C ++ iostreams方法:

文件以二进制模式打开,因此无需花费任何精力将CR + LF转换为'\n'。我们一次性将每个文件读入std::string

auto file = std::ifstream(file_name, std::ios::binary);
std::string text{std::istreambuf_iterator(file), {}};

