Question

我正在尝试找出在C ++中读取大文本（至少5 mb）文件的最佳方法，考虑速度和效率。任何首选的类或功能以及为什么？

顺便说一下，我正在UNIX环境中专门运行。

Answer 1

流类（ifstream）实际上做得很好;假设您没有受到限制，否则请确保关闭sync_with_stdio（在ios_base：:)中。您可以使用getline（）直接读入std :: strings，但从性能角度来看，使用固定缓冲区作为char *（chars或old-school char []的向量）可能更快（风险/复杂度更高））。

如果您愿意使用页面大小计算等来玩游戏，则可以使用mmap路线。我可能首先使用流类构建它，看看它是否足够好。

根据您对每行数据的处理，您可能会开始发现您的处理例程是优化点，而不是I / O.

Answer 2

使用旧样式文件io。

fopen the file for binary read
fseek to the end of the file
ftell to find out how many bytes are in the file.
malloc a chunk of memory to hold all of the bytes + 1
set the extra byte at the end of the buffer to NUL.
fread the entire file into memory.
create a vector of const char *
push_back the address of the first byte into the vector.
repeatedly 
    strstr - search the memory block for the carriage control character(s).
    put a NUL at the found position
    move past the carriage control characters
    push_back that address into the vector
until all of the text in the buffer has been processed.

----------------
use the vector to find the strings,
and process as needed.
when done, delete the memory block
and the vector should self-destruct.

Answer 3

如果您使用存储整数，浮点数和小字符串的文本文件，我的经验是FILE，fopen，fscanf已经足够快，您也可以直接获取数字。我认为内存映射是最快的，但它需要你编写代码来解析文件，这需要额外的工作。

阅读文本文件

3 个答案: