Question

我作为初学者正在研究CUDA，并且正在尝试执行预先编写的代码编译为代码包含的每个原子操作提供错误... 例如

__global__ void MarkEdgesUV(unsigned int *d_edge_flag, unsigned long long int *d_appended_uvw, unsigned int *d_size, int no_of_edges)

{

    unsigned int tid = blockIdx.x*MAX_THREADS_PER_BLOCK + threadIdx.x;

    if(tid<no_of_edges)

    {

    if(tid>0)

        {

        unsigned long long int test = INF;

        test = test << NO_OF_BITS_MOVED_FOR_VERTEX_IDS;

        test |=INF;

        unsigned long long int test1 = d_appended_uvw[tid]>>(64-(NO_OF_BITS_MOVED_FOR_VERTEX_IDS+NO_OF_BITS_MOVED_FOR_VERTEX_IDS));

        unsigned long long int test2 = d_appended_uvw[tid-1]>>(64-(NO_OF_BITS_MOVED_FOR_VERTEX_IDS+NO_OF_BITS_MOVED_FOR_VERTEX_IDS));

        if(test1>test2)

            d_edge_flag[tid]=1;

        if(test1 == test)
            * atomicMin(d_size,tid); //also to know the last element in the array, i.e. the size of new edge list

        }

    else

        d_edge_flag[tid]=1;

    }

}

给出错误：错误：标识符“atomicMin”未定义这恰好是一个非常可靠的代码...我也检查过，原子的用法似乎是正确的....请解释为什么错误发生了？

Answer 1

我猜你只使用nvcc进行编译（默认为sm_10），而没有指定所需的最小计算能力。实际上，在具有CC1.1（计算能力1.1）的设备中引入了32位全局内存上的atomicMin()，如this table中所示。

尝试使用

进行编译

nvcc -arch sm_11 ...

这将使atomicMin()功能得到识别。不过，最好为GPU的实际架构进行编译。

编辑：成为特斯拉C2075，：

nvcc -arch sm_20 ...

应该也可以。

编辑：根据OP的要求，添加参考：

您可以找到有关-arch和-code选项here含义的详细说明。

特别是明确报告 -arch和-code选项都可以省略。

nvcc x.cu

是

的简写

nvcc x.cu -arch=compute_10 -code=sm_10,compute_10

在CUDA C中使用原子操作时出错

1 个答案: