Question

我试图在GPU上分配大约2.75G的内存。当尺寸为“静态”（编译时已知）时，如果尺寸为“动态”，它就会失败。

我在装有CentOS 7.1，Cuda 7.5,2 x TtianX卡，intel 4790K，32GB内存的盒子上

守则：

#include <cstdio>
#include <cuda_runtime.h>

int main() {
    int item_count = 21217344;
    int dim = 128;

    unsigned char * data_dev;
    size_t mem_size = item_count * dim * sizeof(unsigned char);
    printf("memory to alloc %u\n", mem_size);
    int r = cudaMalloc((void **)&data_dev, mem_size);
    if(r) {
        printf("memory alloc failed!\n");
    }

    size_t mem_size_static = 2715820032; // 21217344 * 128 = 2715820032;
    r = cudaMalloc((void **)&data_dev, mem_size_static);
    if(!r) {
        printf("memory alloc succeeded!\n");
    }

}

将其保存到＆＃39; test_mem.cu＆＃39;然后编译它：

 /usr/local/cuda/bin/nvcc test_mem.cu

运行它：

[root@localhost test]# ./a.out 
memory to alloc 2715820032
memory alloc failed!
memory alloc succeeded!

所以对此有任何想法吗？

Answer 1

int item_count = 21217344;
int dim = 128;

那些是int s，其产品是2715820032，溢出为-1579147264。请求负内存量当然是错误，cudaMalloc失败。

你想要的是要么声明具有更宽类型的那些（例如std::size_t），要么在乘法之前将组件转换为更宽的类型，并且一切都会正常工作。< / p>

附注：如果您使用C ++的std::cout代替printf，或者使用正确的大小格式说明符%z，您会立即发现错误。

cudaMalloc以静态大小成功但失败并且动态计算大小

1 个答案: