我正在尝试找出在C ++中读取大文本(至少5 mb)文件的最佳方法,考虑速度和效率。任何首选的类或功能以及为什么?
顺便说一下,我正在UNIX环境中专门运行。
答案 0 :(得分:0)
流类(ifstream)实际上做得很好;假设您没有受到限制,否则请确保关闭sync_with_stdio(在ios_base::)中。您可以使用getline()直接读入std :: strings,但从性能角度来看,使用固定缓冲区作为char *(chars或old-school char []的向量)可能更快(风险/复杂度更高) )。
如果您愿意使用页面大小计算等来玩游戏,则可以使用mmap路线。我可能首先使用流类构建它,看看它是否足够好。
根据您对每行数据的处理,您可能会开始发现您的处理例程是优化点,而不是I / O.
答案 1 :(得分:0)
使用旧样式文件io。
fopen the file for binary read
fseek to the end of the file
ftell to find out how many bytes are in the file.
malloc a chunk of memory to hold all of the bytes + 1
set the extra byte at the end of the buffer to NUL.
fread the entire file into memory.
create a vector of const char *
push_back the address of the first byte into the vector.
repeatedly
strstr - search the memory block for the carriage control character(s).
put a NUL at the found position
move past the carriage control characters
push_back that address into the vector
until all of the text in the buffer has been processed.
----------------
use the vector to find the strings,
and process as needed.
when done, delete the memory block
and the vector should self-destruct.
答案 2 :(得分:0)
如果您使用存储整数,浮点数和小字符串的文本文件,我的经验是FILE
,fopen
,fscanf
已经足够快,您也可以直接获取数字。我认为内存映射是最快的,但它需要你编写代码来解析文件,这需要额外的工作。