Question

继续我的CUDA初学者的冒险，我被介绍给Thrust，这似乎是一个方便的库，为我省去了显式内存（去）分配的麻烦。

我已经尝试将其与一些cuBLAS例程结合使用，例如gemv，通过使用thrust::raw_pointer_cast(array.data())生成指向基础存储的原始指针，然后将其提供给例程，就可以了。

当前的任务是获取矩阵的逆，为此，我正在使用getrfBatched和getriBatched。从文档中：

cublasStatus_t cublasDgetrfBatched(cublasHandle_t handle,
                                   int n, 
                                   double *Aarray[],
                                   int lda, 
                                   int *PivotArray,
                                   int *infoArray,
                                   int batchSize);

其中

Aarray - device - array of pointers to <type> array

自然地，我想我可以使用另一层Thrust向量来表示该指针数组，然后再次将其原始指针提供给cuBLAS，所以我要做的是：

void test()
{
    thrust::device_vector<double> in(4);
    in[0] = 1;
    in[1] = 3;
    in[2] = 2;
    in[3] = 4;
    cublasStatus_t stat;
    cublasHandle_t handle;
    stat = cublasCreate(&handle);
    thrust::device_vector<double> out(4, 0);
    thrust::device_vector<int> pivot(2, 0);
    int info = 0;
    thrust::device_vector<double*> in_array(1);
    in_array[0] = thrust::raw_pointer_cast(in.data());
    thrust::device_vector<double*> out_array(1);
    out_array[0] = thrust::raw_pointer_cast(out.data());
    stat = cublasDgetrfBatched(handle, 2,
        (double**)thrust::raw_pointer_cast(in_array.data()), 2,
        thrust::raw_pointer_cast(pivot.data()), &info, 1);
    stat = cublasDgetriBatched(handle, 2,
        (const double**)thrust::raw_pointer_cast(in_array.data()), 2,
        thrust::raw_pointer_cast(pivot.data()),
        (double**)thrust::raw_pointer_cast(out_array.data()), 2, &info, 1);
}

执行后，stat说CUBLAS_STATUS_SUCCESS (0)，info说0（执行成功），但是如果我尝试访问in的元素，{使用标准括号符号的{1}}或pivot，我击中了out。在我看来，相应的内存以某种方式损坏了。

我显然在这里不见了吗？

Answer 1

cublas<t>getrfBatched的{{3}}表示infoArray参数应该是指向设备内存的指针。

相反，您已经传递了一个指向主机内存的指针：

int info = 0;
...
stat = cublasDgetrfBatched(handle, 2,
    (double**)thrust::raw_pointer_cast(in_array.data()), 2,
    thrust::raw_pointer_cast(pivot.data()), &info, 1);
                                            ^^^^^

如果您使用cuda-memcheck运行代码（我认为，这是一种很好的做法，只要您遇到CUDA代码时遇到麻烦，请之前向其他人寻求帮助），收到“大小为4的无效全局写入”错误。这是由于以下事实：由cublasDgetrfBatched()启动的内核正在尝试使用您提供的普通主机指针将info数据写入设备内存，这在CUDA中始终是非法的。

出于性能原因，CUBLAS本身不会捕获此类错误。但是，在某些情况下，推力API使用更严格的同步和错误检查。因此，即使该错误与推力无关，在此错误之后使用推力代码也会报告错误（这是先前内核启动中异步报告的错误）。

解决方案很简单；为info提供设备存储空间：

$ cat t329.cu
#include <thrust/device_vector.h>
#include <cublas_v2.h>
#include <iostream>

void test()
{
    thrust::device_vector<double> in(4);
    in[0] = 1;
    in[1] = 3;
    in[2] = 2;
    in[3] = 4;
    cublasStatus_t stat;
    cublasHandle_t handle;
    stat = cublasCreate(&handle);
    thrust::device_vector<double> out(4, 0);
    thrust::device_vector<int> pivot(2, 0);
    thrust::device_vector<int> info(1, 0);
    thrust::device_vector<double*> in_array(1);
    in_array[0] = thrust::raw_pointer_cast(in.data());
    thrust::device_vector<double*> out_array(1);
    out_array[0] = thrust::raw_pointer_cast(out.data());
    stat = cublasDgetrfBatched(handle, 2,
        (double**)thrust::raw_pointer_cast(in_array.data()), 2,
        thrust::raw_pointer_cast(pivot.data()), thrust::raw_pointer_cast(info.data()), 1);
    stat = cublasDgetriBatched(handle, 2,
        (const double**)thrust::raw_pointer_cast(in_array.data()), 2,
        thrust::raw_pointer_cast(pivot.data()),
        (double**)thrust::raw_pointer_cast(out_array.data()), 2, thrust::raw_pointer_cast(info.data()), 1);
    for (int i = 0; i < 4; i++) {
      double test = in[i];
      std::cout << test << std::endl;
      }
}


int main(){

  test();
}
$ nvcc -o t329 t329.cu -lcublas
t329.cu(12): warning: variable "stat" was set but never used

$ cuda-memcheck ./t329
========= CUDA-MEMCHECK
3
0.333333
4
0.666667
========= ERROR SUMMARY: 0 errors
$

您会注意到，上述代码中的更改适用于两个cublas调用的用法，因为infoArray参数对这两个调用具有相同的期望。

将推力向量输入getrf / getri

1 个答案: