Question

我正在研究CUDA，我遇到了与线程同步有关的问题。在我的代码中，我需要线程来执行代码的不同部分，例如：

one thread -> 
all thread ->
one thread ->

这就是我想要的。在代码的初始部分，只有一个线程将执行，然后某些部分将由所有线程执行，然后再次执行单个线程。线程也在循环中执行。谁能告诉我怎么做？

Answer 1

您只能在单个块中同步线程。可以在多个块之间进行同步，但仅限于非常特定的情况。如果您需要在所有线程之间进行全局同步，那么执行此操作的方法是启动新内核。

在一个块中，您可以使用__syncthreads()同步线程。例如：

__global__ void F(float *A, int N)
{
    int idx = threadIdx.x + blockIdx.x * blockDim.x;

    if (threadIdx.x == 0) // thread 0 of each block does this:
    {
         // Whatever
    }
    __syncthreads();

    if (idx < N) // prevent buffer overruns
    {
        A[idx] = A[idx] * A[idx];  // "real work"
    }

    __syncthreads();

    if (threadIdx.x == 0) // thread 0 of each block does this:
    {
         // Whatever
    }
}

Answer 2

您需要使用线程ID来控制执行的内容，例如

if (thread_ID == 0)
{
  // do single thread stuff
}

// do common stuff on all threads

if (thread_ID == 0)
{
  // do single thread stuff
}

Answer 3

如果您的程序包含多个块，则需要跨块使用自定义同步机制。如果你的内核只启动一个块，那么__syncthreads（）将起作用。

如何让不同的线程在CUDA中执行不同的部分？

3 个答案: