大文件(100GB)使用内存映射按块打开和读取块

时间:2012-04-23 09:02:05

标签: c linux mapping

我想打开一个大小为100GB的文件,每次都使用文件映射来从块中读取数据块。当偏移量大于2GB时,它总是会开始映射。我以为我可能是不支持64位寻址的功能。但是在我添加大文件支持(包括大文件支持定义,大文件打开选项,以及使用命令-D_FILE_OFFSET_BITS = 64 -D_LARGE_FILE编译)之后。但是,仍然会出现同样的问题。这是简化的代码:

#define _LARGEFILE_SOURCE
#define _LARGEFILE64_SOURCE
#define _FILE_OFFSET_BITS 64
#include <math.h>
#include <time.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include<fcntl.h>
#include<unistd.h>
#include<sys/stat.h>
#include<sys/time.h>
#include<sys/mman.h>
#include<sys/types.h>

#define PERMS 0600

int total_piece, PAGE, buffer, share, offset, count, chunk;

void get_size(char * strFileName)   
{  
    struct stat temp;  
    stat(strFileName, &temp);

    PAGE = getpagesize();             
    total_piece = temp.st_size/PAGE;
    chunk = 1024*1024*1024*0.4/PAGE; 

    if (temp.st_size%PAGE!=0)     
    total_piece++;
}

char *
mmaping (char *source)
{
  int src;
  char *sm;
  struct stat statbuf;

  if ((src = open (source, O_RDONLY)) < 0)  //I thought error comes from this line. So I tried to use large file support as following. But still the same. 
    {
      perror (" open source ");
      exit (EXIT_FAILURE);
    }
/*
  if ((src = open64(source, O_RDONLY|O_LARGEFILE, 0644))<0)  
    {
      perror (" open source ");
      exit (EXIT_FAILURE);
    }
*/
  if (fstat (src, &statbuf) < 0)
    {
      perror (" fstat source ");
      exit (EXIT_FAILURE);
    }

  printf("share->%d PAGES per node\n",share);

  if (share>=chunk)
  buffer = chunk;
  else
  buffer = share;

  printf("total pieces->%d\n",total_piece);
  printf("data left->%d\n",share);
  printf("buffer size->%d\n",buffer);
  printf("PAGE size->%d\n",PAGE);

  sm = mmap (0,buffer*PAGE, PROT_READ, MAP_SHARED | MAP_NORESERVE,src, offset*PAGE); 

  if (MAP_FAILED == sm)
    {
      perror (" mmap source ");
      exit (EXIT_FAILURE);
    }

  return sm;
}

main(int argc, char**argv){

   get_size(argv[1]);

   share = total_piece;

   offset = 0;

   while (share>0)
   {

      char *x = mmaping(argv[1]);

      printf("data->%0.30s\n",x); //bus error will occur when offset reaches 2GiB, which proves my thought: it maps          nothing.

      munmap(x,buffer*PAGE);  

      share-=buffer;

      offset+=buffer;

   }

   return 0;
}

有人可以帮助我吗?

1 个答案:

答案 0 :(得分:5)

当然,类型为&#34; int&#34;的变量,在Linux上为32位,不足以包含100 GB文件的字节大小。对于文件大小/偏移量,您需要使用类型&#34; off_t&#34;相反(当你已经完成启用LFS支持时,它是off64_t的一个别名,一个带符号的64位整数)。

同样,&#34;长度&#34; mmap的参数是size_t类型,而不是int。

为了使代码可以同时移植到32位和64位目标,无论是否有LFS,您都需要注意应该在哪里使用哪些整数类型。