带有OpenCL代码的malloc问题 - OS X上大小的mach_vm_map

时间:2015-04-21 16:08:30

标签: c++ macos malloc out-of-memory opencl

我有一个关于将OpenCL代码从Linux(它在哪里工作)移植到Mac OS X 10.9.5的问题。

在我使用malloc的代码部分,当我启动可执行文件时,我收到以下错误:

OpenCLSimu(13400,0x7fff7da7c310) malloc: *** mach_vm_map(size=1556840295209897984) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug

如您所见,请求的内存很大:1556840295209897984字节,因此分配失败。

这里是分配部分的例程(在我的情况下,NumBodies是30720):

int OpenCLSimu::setup()
{
  // Make sure numParticles is multiple of group size
  numBodies = (cl_int)(((size_t) getNumParticles() 
        < groupSize) ? groupSize : getNumParticles());

  initPos = (cl_double*) malloc(numBodies * sizeof(cl_double4));
  CHECK_ALLOCATION(initPos, "Failed to allocate host memory. (initPos)");

  initVel = (cl_double*) malloc(numBodies * sizeof(cl_double4));
  CHECK_ALLOCATION(initVel, "Failed to allocate host memory. (initVel)");

  pos = (cl_double*) malloc(numBodies * sizeof(cl_double4));
  CHECK_ALLOCATION(pos, "Failed to allocate host memory. (pos)");

  vel= (cl_double*) malloc(numBodies * sizeof(cl_double4));
  CHECK_ALLOCATION(vel, "Failed to allocate host memory. (vel)");

  return NBODY_SUCCESS;
}

我不知道是否存在关系,但我已经在https://bugs.openjdk.java.net/browse/JDK-8043507(使用Java语言)发现,在OS X上,我们必须为大小指定uint32_t类型。

也许这个问题来自我用于编译的clang编译器。

CC            = /usr/bin/clang
CXX           = /usr/bin/clang++
DEFINES       = -DQT_NO_DEBUG -DQT_OPENGL_LIB -DQT_GUI_LIB -DQT_CORE_LIB -DQT_SHARED
CFLAGS        = -pipe -O2 -arch x86_64 -Xarch_x86_64 -mmacosx-version-min=10.9 -Wall -W $(DEFINES)
CXXFLAGS      = -pipe -O2 -arch x86_64 -Xarch_x86_64 -mmacosx-version-min=10.9 -Wall -W $(DEFINES)

我还尝试将numBodies设置为3072,以便查看mach_vm_map的巨大尺寸,我得到:

malloc: * mach_vm_map(size = 868306322687266816)失败(错误代码= 3) * 错误:无法分配区域 ***在malloc_error_break中设置断点以进行调试

我注意到这些尺寸总是随着不同的执行而变化。

最后,我将posvel数组的Linux版本纳入上述例程:

pos = (cl_double*)memalign(16, numBodies * sizeof(cl_double4));

vel = (cl_double*)memalign(16, numBodies * sizeof(cl_double4));

而不是malloc使用:

  pos = (cl_double*) malloc(numBodies * sizeof(cl_double4));

  vel= (cl_double*) malloc(numBodies * sizeof(cl_double4));

我已经看到在OS X上,默认情况下数据在16字节边界上对齐,这就是为什么我将memalign替换为malloc for MacOS version

如果有人有线索,那就太好了。

提前致谢。

更新:

错误发生在&#34; cout << size of source =" << sourceSize << endl&#34;和#34; cout << "status =" << status << endl&#34;,所以它在clCreateProgramWithSource方法失败了:

// create a CL program using the kernel source                                         
  const char *kernelName = "Simu_Kernels.cl";                                           
  FILE *fp = fopen(kernelName, "r");                                                     
  if (!fp) {                                                                             
    fprintf(stderr, "Failed to load kernel.\n");                                         
    exit(1);                                                                             
  } 
  char *source = (char*)malloc(10000);                                                   
  int sourceSize = fread( source, 1, 10000, fp);                                         
  fclose(fp);                                                                            

  cout << "size of source =" << sourceSize << endl;                                      

  // Create a program from the kernel source                                             
  program = clCreateProgramWithSource(context, 1, (const char **)&source, (const size_t *)&sourceSize, &status);
  //program = clCreateProgramWithSource(context, 1, (const char **)&source, NULL, &status);

  cout << "status =" << status << endl;
  cout << "current_device =" << current_device<< endl;   

执行时,我得到:

Selected Platform Vendor : Apple
Device 0 : Iris Pro Device ID is 0x1024500
Device 1 : GeForce GT 750M Device ID is 0x1022700
size of source =2026
OpenCLSimu(15802,0x7fff7da7c310) malloc: *** mach_vm_map(size=59606861803950080) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
status =-6

status = -6对应CL_OUT_OF_HOST_MEMORY

我告诉你我的macbook上有2个GPU单元(Iris Pro Device和GeForce GT 750M)。我对两个设备都有相同的错误。

1 个答案:

答案 0 :(得分:0)

尝试按如下方式创建程序:

static char* Read_Source_File(
        const char  *filename,
        size_t      *file_len)
{
    long int
        size = 0,
        res  = 0;

    char *src = NULL;

    FILE *file = fopen(filename, "rb");

    if (!file)  return NULL;

    if (fseek(file, 0, SEEK_END))
    {
        fclose(file);
        return NULL;
    }

    size = ftell(file);
    *file_len = size;
    if (size == 0)
    {
        fclose(file);
        return NULL;
    }

    rewind(file);

    src = (char *)calloc(size + 1, sizeof(char));
    if (!src)
    {
        src = NULL;
        fclose(file);
        return src;
    }

    res = fread(src, 1, sizeof(char) * size, file);
    if (res != sizeof(char) * size)
    {
        fclose(file);
        free(src);

        return (char*)NULL;
    }

    src[size] = '\0'; /* NULL terminated */
    fclose(file);

    return src;
}

size_t file_len;
char *source = Read_Source_File("path_to_kernel.cl", &file_len);
if(source){
    program = clCreateProgramWithSource(context, 1, (const char **)&src_file, NULL, &status);
}