Question

使用 Windows API 功能逐行地逐行读取大文本文件的最有效方法是什么？例如，如果文件是：

line 1
...
line 108777
line 108778

输出应为：

line 108778
line 108777
...
line 1

我想为此编写一个C程序。您不需要编写代码（但如果您愿意，这很好），我只是对如何执行此操作感兴趣，因为文件很大并且我希望程序尽可能快地运行。

另外，我对要使用的Windows API函数感兴趣。

Answer 1

更聪明的解决方案是打开文件，将文件偏移量设置为（文件末尾 - buffersize）和读取（buffersize）字节，你可以从后到前解析缓冲区中的数据以查找换行符并做任何你想做的事，等等。

Answer 2

如果性能比内存利用率更重要，我只需将整个文本文件缓冲读取到内存中，然后按照您喜欢的顺序解析它。

看一下memory mapped files，讨论了here的一些优点。

Answer 3

Memory-map the file.它将自动为您缓冲 - 只需将其读作内存，从尾部开始查找CR / LF / CRLF。

Answer 4

如果文件大于可用地址空间，则内存映射文件将失败（或至少变得非常棘手）。相反，试试这个：

input = input file
block_prefix = unique temporary file
block_index = 0

while (!eof (input))
{
   line = input.readline ();
   push line onto a stack

   if (stack > 100 entries) // doesn't have to be 100
   {
      output = block_prefix + block_index++

      while (stack has entries)
      {
        pop line off stack
        write to output
      }
   }
}

if (stack has entries)
{
  output = block_prefix + block_index++

  while (stack has entries)
  {
    pop line off stack
    write to output
  }
}

output = output file

while (block_index)
{
   read entire contents of block file (block_prefix + --block_index)
   write contents to output
   delete block file
}

Answer 5

一种方法是使用文件偏移容器到每行的开头。解析文件后，以相反的顺序处理容器。请参阅fgetc，fgets和fseek。

向后阅读大文本文件的最有效方法是什么？

5 个答案: