Question

一直在尝试使以下代码正常工作

__global__ void kernel(){
    if (threadIdx.x == 1){
            while(var == 0){
            }
    }
    if (threadIdx.x == 0){
            var = 1;
    }
}

其中var是全局设备变量。我只是发射使用内核＆lt;＆lt;＆lt;＆lt;＆lt;＆lt;＆lt;＆lt;＆lt;＆gt;＆gt;＆gt;（）;

的同一块中的两个线程

如果我切换ifs的顺序，代码终止。然而，如果我不切换ifs的顺序代码没有终止。这似乎是一个线程进入无限循环然后在该线程之前不会为其他线程分配运行时结束所有代码。

我的印象是在GPU中所有线程都有一些运行时间分配给他们（虽然订单可能不为我们所知）。

我也尝试将__threadfence（）放在while循环内部 ifs声明，也尝试了一些 while循环中的printf。它仍然无效。

发生了什么事？任何反馈都将不胜感激。

谢谢！

Answer 1

如果var是某种全局变量，那么当您考虑如何调度线程的指令时，您看到的内容非常有意义。您需要遍历代码，因为您是线程的扭曲（32个线程）。分歧是当这32个线程中的一些执行某些代码时，而其他线程则不执行。发生分歧时，只有运行相同指令的线程才会实际运行，直到其他线程重新启动。

换句话说......

__global__ void kernel(){
    //Both threads encounter this at the same time. Thread 0 is set on "hold" while thread 1 continues in the if block.
    if (threadIdx.x == 1){ 
                while(var == 0){
                }//infinite loop, Thread 0 will always be on hold. Thread 1 will always be in this loop
        }

        if (threadIdx.x == 0){
                var = 1;
    }
}

而不是......

__global__ void kernel(){
    //Both threads encounter this at the same time. Thread 1 is set on "hold" while thread 0 continues in the if block.
        if (threadIdx.x == 0){
        //thread 1 sets global variable var to 1
                var = 1;
    }
    //Threads 1 and 0 join again.
    //Both encounter this. Thread 0 is set on hold while thread 1 continues.
    if (threadIdx.x == 1){ 
        //var was set to 1, this is ignored.
                while(var == 0){
                }
        }
    //Both threads join

}

重新阅读编程指南并查看warp。如果你想进一步测试，尝试将两个线程放在两个块中，这样可以防止它们处于同一个warp中。

预先警告虽然CUDA一般不保证warp和块之间的线程执行顺序（除非使用某种方法同步__syncthreads（）或退出内核）。

GPU CUDA内部线程调度无法正常工作

1 个答案: