Question

我最近偶然发现了NVIDIA devblogs中的这篇博文： https://devblogs.nvidia.com/parallelforall/accelerating-graph-betweenness-centrality-cuda/

我实现了边缘并行代码，它似乎按预期工作，但在我看来，代码使用__syncthreads“控制”的竞争条件。这是代码（如博客中所示）：

__shared__ int current_depth;
__shared__ bool done;

if(idx == 0){
    done = false;
    current_depth = 0;
}
__syncthreads();

// Calculate the number of shortest paths and the 
// distance from s (the root) to each vertex
while(!done){
    __syncthreads();
    done = true;
    __syncthreads();

for(int k=idx; k<m; k+=blockDim.x) //For each edge...
{
    int v = F[k];
    // If the head is in the vertex frontier, look at the tail
    if(d[v] == current_depth) 
    {
        int w = C[k];
        if(d[w] == INT_MAX){
            d[w] = d[v] + 1;
            done = false;
        }
        if(d[w] == (d[v] + 1)){
            atomicAdd(&sigma[w],sigma[v]);
        }
    }
    __syncthreads();
    current_depth++;
    }
}

我认为最后会出现竞争状况：

__syncthreads();
current_depth++;

我认为该程序依赖于竞争条件，因此变量只增加了一个，而不是增加了线程数。我觉得这不是一个好主意，但在我的测试中它似乎是可靠的。这真的很安全吗？有没有更好的方法呢？感谢。

Answer 1

作为此博客文章的作者，我要感谢您指出此错误！

当我写这个片段时，我没有使用我的逐字遍历代码，因为它使用显式排队来遍历图形，这使得示例更加复杂而不添加任何教学价值。相反，我必须有货物捣毁一些旧代码并错误地发布。自从我触及此代码或算法以来已经有一段时间了，但我相信以下代码段应该有效：

Fabric

注意：

在博客文章
您还可以使用寄存器而不是current_depth的共享变量，在这种情况下，每个线程都必须递增它

所以回答你的问题，不，这种方法不安全。如果我没有弄错的话，博客片段还有一个额外的问题：__shared__ int current_depth; __shared__ bool done; if(idx == 0){ done = false; current_depth = 0; } __syncthreads(); // Calculate the number of shortest paths and the // distance from s (the root) to each vertex while(!done) { __syncthreads(); done = true; __syncthreads(); for(int k=idx; k<m; k+=blockDim.x) //For each edge... { int v = F[k]; // If the head is in the vertex frontier, look at the tail if(d[v] == current_depth) { int w = C[k]; if(d[w] == INT_MAX){ d[w] = d[v] + 1; done = false; } if(d[w] == (d[v] + 1)){ atomicAdd(&sigma[w],sigma[v]); } } } __syncthreads(); //All threads reach here, no longer UB if(idx == 0){ //Only one thread should increment this shared variable current_depth++; } }只有在处理完前一个深度的所有顶点时才会递增，这是for循环的结束。

最后，如果您希望我的代码的最终版本已经过社区中的人员测试和使用，您可以在此处访问它：https://github.com/Adam27X/hybrid_BC

这个Parallel Forall blogopost的代码中是否存在争用条件？

1 个答案: