Question

Consider a program that is going to process an entire input file sequentially in one pass. Is there any advantage to mapping the file into memory versus reading it into a buffer for processing?

I understand that if you were going to access only portions of the file, then memory mapped i/o can save disk accesses for the portions of the file not needed. But I'm interested in one sequential pass of the entire file.
If you were going to read the file (or at least portions of it) multiple times, it might be faster to let the virtual memory system figure out which parts to keep in cache. But, again, one sequential pass over the entire file won't benefit from this.
I know that high level i/o (e.g., C++ i/o streams or C functions like fscanf) introduce layers of buffering and abstraction on top of the OS's fundamental read operation. Let's avoid the language's standard library and focus on the OS call (i.e., ReadFile on Windows or read() on Linux).

It seems to me the bottleneck (reading the data from the disc) is the same with either approach, yet I hear people claim that memory mapping has less overhead, even in the case of one sequential pass over the entire file.

I'll concede that if two programs are trying to read the same file via memory mapping, then the second can map the same physical pages into its own address space, avoiding the actual disk reads. Are there any other advantages?

I'm interested primarily in Windows, but bonus points if you can also point out any significant differences with respect to Linux.

Answer 1

可能是。与使用MMF相比，不必发出太多的系统调用来读取文件，可能会带来较小的性能提升。

根据您的顺序注释，我假设您的程序是单线程的。如果要执行CPU密集型处理，则可以告诉内核在开始处理文件开头时在后台预取文件（使用PrefetchVirtualMemory）。这比处理它的一部分并在循环中调用ReadFile更具性能，因为您不必等待ReadFile返回，也不必等待读取整个文件才能自己存储您开始处理。尽管我想您可以使用异步IO来破解类似的东西，但是当操作系统可以为您完成任务时，为什么要重新发明轮子。

Answer 2

我通过修改读取文本文件并将单词放入TRIE的程序在Windows上进行了实验。为了关注I / O性能，我注释掉了实际的TRIE操作，因此该程序只读取文本并将其分解为单词。

结果

Method
C++ iostreams    228 ms (σ =  6)
Win32 ReadFile   115 ms (σ =  8)
memory mapped    136 ms (σ = 14)

结论

结果证实了我的怀疑，对于单次连续传递，内存映射相对于Win32 ReadFile没有实质性的优势。实际上，可能会有一个小的损失（更多的系统调用？）和更多的变化。

要清楚，这只是在一台Windows计算机上的测试。我听过合理的解释，为什么Linux上的mmap可能更快。

不足为奇的是，C ++ iostreams库中额外的缓冲层使其成为最慢的方法。

方法

作为输入，我使用了Guttenberg项目的18本书，总计10,558,803字节。这些书主要是ASCII的，但是有些书包含一些编码为UTF-8的非ASCII字符。

主程序循环打开一个文件，一次将整个文件读取（或映射）到内存中，对其进行标记化，然后关闭（或取消映射）该文件。

令牌化是一种手写状态机，它为每个单词构建一个std::string_view。它会按顺序准确地一次读取每个字节一次。我保留了令牌化功能，以确保在文件读取解决方案与内存映射解决方案之间进行逐个比较，否则可能无法将数据带入内存中。

用C ++编码，使用/EHsc /O2 /std:c++latest通过MSVC 2019编译成64位可执行文件。在具有SSD的基于Intel的台式机上执行。

每个实验都使用热缓存运行了七次。时间用std::chrono::high_resolution_clock记录，并以毫秒为单位进行报告。不管使用哪种方法，令牌生成器报告的每次运行读取的字节数相同，找到的字数相同。

C ++ iostreams方法：

文件以二进制模式打开，因此无需花费任何精力将CR + LF转换为'\n'。我们一次性将每个文件读入std::string。

auto file = std::ifstream(file_name, std::ios::binary);
std::string text{std::istreambuf_iterator(file), {}};
file.close();
Tokenize(text);

Win32 ReadFile方法：

请注意，我们用FILE_FLAG_SEQUENTIAL_SCAN进行了提示，并且没有错误检查。我们对每个文件使用一个ReadFile调用，将数据放入std::string中，该FILE_FLAG_SEQUENTIAL_SCAN使用文件大小预先分配并初始化为零。

内存映射方法

我们使用相同的选项（特别是{{1}}打开文件。与ReadFile方法相比，还有其他系统调用（CreateFileMapping，MapViewOfFiew，UnMapViewOfFile和其他CloseHandle）。

Is memory mapped i/o worthwhile for sequential processing?

2 个答案: