优化大文件大小的顺序i / o操作

时间:2013-04-19 08:02:23

标签: winapi optimization fstream large-files

Compiler:  Microsoft C++ 2005
Hardware: AMD 64-bit (16 GB)


从18GB文件进行顺序,只读访问,具有以下时序,文件访问和文件结构特征:

18,184,359,164(文件长度)
11,240,476,672 (ntfs压缩文件长度)

Time    File         Method                                 Disk
14:33?  compressed   fstream                                fixed disk
14:06   normal       fstream                                fixed disk
12:22   normal       winapi                                 fixed disk
11:47   compressed   winapi                                 fixed disk
11:29   compressed   fstream                                ram disk
10:37   compressed   winapi                                 ram disk
 7:18   compressed   7z stored decompression to ntfs 12gb   ram disk
 6:37   normal       copy to same volume                    fixed disk



fstream构造函数和访问权限:

define BUFFERSIZE 524288
    unsigned int mbytes = BUFFERSIZE;
    char * databuffer0; databuffer0 = (char*) malloc (mbytes);
    datafile.open("drv:/file.ext", ios::in | ios::binary );
    datafile.read (databuffer0, mbytes);


winapi构造函数和访问权限:

define BUFFERSIZE 524288
    unsigned int mbytes = BUFFERSIZE;
    const TCHAR* const filex = _T("drv:/file.ext");
    char   ReadBuffer[BUFFERSIZE] = {0};
    hFile = CreateFile(filex, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
    if( FALSE == ReadFile(hFile, ReadBuffer, BUFFERSIZE-1, &dwBytesRead, NULL))
    { ...

对于fstream方法, - > 16MB缓冲区大小不会减少处理时间。对于winapi方法,超过.5MB的所有缓冲区大小都会失败。有哪些方法可以优化此实现与处理时间?

2 个答案:

答案 0 :(得分:0)

您是否尝试过内存映射文件?在我的测试中,这始终是读取大文件的最快方式。

更新:以下是内存映射文件的旧的,但仍然准确的描述: http://msdn.microsoft.com/en-us/library/ms810613.aspx

答案 1 :(得分:0)

试试这个。

hf = CreateFile(..... FILE_FLAG_NO_BUFFERING | FILE_FLAG_OVERLAPPED ...)

然后是阅读循环。在iPad上输入时省略了细节......

int bufsize =4*1024*1024;
CEvent e1;
CEvent e2;
CEvent e3;
CEvent e4;
unsigned char* pbuffer1 = malloc(bufsize);
unsigned char* pbuffer2 = malloc(bufsize);
unsigned char* pbuffer3 = malloc(bufsize);
unsigned char* pbuffer4 = malloc(bufsize);
int CurOffset = 0;

do {
   OVERLAPPED r1;
   memset(&r1, 0, sizeof(OVERLAPPED));
   r1.Offset = CurOffset;
   CurOffset += bufsize;
   r1.hEvent = e1;
   if (! ReadFile(hf, pbuffer1, bufsize, bufsize, &r1)) {
       // check for error AND error_handle_eof (important)
   }

   OVERLAPPED r2;
   memset(&r2, 0, sizeof(OVERLAPPED));
   r2.Offset = CurOffset;
   CurOffset += bufsize;
   r2.hEvent = e2;
   if (! ReadFile(hf, pbuffer2, bufsize, bufsize, &r2)) {
       // check for error AND error_handle_eof (important)
   }

   OVERLAPPED r3;
   memset(&r3, 0, sizeof(OVERLAPPED));
   r3.Offset = CurOffset;
   CurOffset += bufsize;
   r3.hEvent = e3;
   if (! ReadFile(hf, pbuffer3, bufsize, bufsize, &r3)) {
       // check for error AND error_handle_eof (important)
   }

   OVERLAPPED r4;
   memset(&r4, 0, sizeof(OVERLAPPED));
   r4.Offset = CurOffset;
   CurOffset += bufsize;
   r4.hEvent = e4;
   if (! ReadFile(hf, pbuffer1, bufsize, bufsize, &r4)) {
       // check for error AND error_handle_eof (important)
   }

   // wait for events to indicate data present
   // send data to consuming threads
   // allocate new buffer
} while ( not eof, etc )

以上是您需要的骨骼。我们使用它并实现高I / O吞吐率,但您可能需要稍微改进它以实现最佳性能。我们发现4个出色的I / O最适合我们使用,但这会因平台而异。读取每IO不到1Mb的性能为负。一旦你读取了缓冲区,就不要在读取循环中使用它,将其发布到另一个线程,并分配另一个缓冲区(但是从重用队列中获取它们,不要继续使用malloc)。上述的总体意图是尝试保持4个未完成的IO对磁盘开放,一旦没有这个,整体性能就会下降。

此外,这在仅读取您的文件的磁盘上效果最佳。如果您同时开始在同一磁盘上读/写不同的文件,性能会迅速下降,除非您有SSD磁盘!

不确定为什么你的readfile失败了0.5Mb缓冲区,只是双重检查,我们的实时prod代码使用4Mb缓冲区