Question

我想使用下面的代码来测试如何通过GPU上的指针进行分配。

#include <stdio.h>
#include <cuda_runtime.h>
#include <iostream>
#include <vector>
using namespace std;

int main(void)
{
    cudaError_t err = cudaSuccess;
    size_t numBytes;
    vector<int*> a;

    numBytes = 10 * sizeof(int);
    err = cudaMalloc((void**)&a[0], numBytes);

    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to allocate device vector A (error code %s)!\n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }
    printf("Done\n");
    return 0;
}

我可以通过cmd：nvcc b.cu -o b.o成功编译它，但是当我用cmd：./b.o运行它时，我遇到了以下错误信息：

Failed to allocate device vector A (error code invalid argument)!

我猜在使用指针时会出现一些错误，但我不确定为什么会这样。

Answer 1

您的向量a中没有元素。这是一个空的载体。 a[0]不存在。

您可能想要学习std::vector及其constructors，这些与CUDA无关。请注意您使用的构造函数类型的第一个注释：

1）默认构造函数。构造一个空容器。

如果您将该向量定义为非零大小，例如：

 vector<int*> a(5);

那么我认为你将能够克服这个错误。（例如，这会分配用于存储5个int指针的空间）

Answer 2

要在GPU设备上分配std::vector内存，请记住其模板签名为：

template<
    class T,
    class Allocator = std::allocator<T>
> class vector;

也就是说，它将allocator类作为模板参数。此类可能会在GPU上执行分配。现在，C ++标准库分配器机制被认为设计不是很好或者使用起来非常友好，但它是可用的。尝试this (old-ish) tutorial编写自定义文件。

H-o-w-e-v-e-r ......这可能不是你想要的。您将无法在设备端代码中使用std::vector，因为它的大多数方法都是仅主机的，并且当包含在设备中时，大部分C ++标准库将无法工作/将无法编译。

更相关的替代方案可能是使用thrust库，它提供类似标准库的容器和通用算法。推力有device_vector class，这可能就是你实际追求的目标。

在使用std :: vector <int * =“”>时使用cudaMalloc（）进行分配

2 个答案: