Question

我有两个可以同时执行的内核（A和B）。我需要内核A尽快完成（进行结果的MPI交换）。所以我可以在一个流中执行它们：A然后是B.

但是，内核A的线程块很少，所以如果我按顺序运行A和B，那么在A运行时GPU不会被充分利用。

是否可以与具有更高优先级的A同时执行A和B？

予。例如，我希望内核B中的线程块只有在来自内核A的没有非启动块时才开始执行。

据我所知，如果我在一个流中启动内核A，并且在主机代码中的下一行，在另一个流中启动内核B，我不能保证B中的线程块实际上不会先执行？

Answer 1

NVIDIA现在提供了一种优先处理CUDA内核的方法。这是一个相当新的功能，因此您需要升级到CUDA 5.5才能实现。

对于您的情况，您可以在高优先级CUDA流中启动kernel A，然后在低优先级CUDA流中启动kernel B。您可能需要的功能是 cudaStreamCreateWithPriority(..., priority) 。

要使用此功能，您需要具有3.5或更高计算能力的GPU。要检查GPU上是否支持优先级，请查看cudaDeviceProp::streamPrioritiesSupported。
cudaDeviceGetStreamPriorityRange应该告诉您GPU上有多少优先级。 cudaDeviceGetStreamPriorityRange的语法有点不对劲;值得一看CUDA手册，了解其工作原理。

有关CUDA Runtime API manual的优先级设置的详细文档：

cudaError_t cudaStreamCreateWithPriority(cudaStream_t *pStream, 
                                         unsigned int flags, int priority)
Create an asynchronous stream with the specified priority.

Parameters
pStream  = Pointer to new stream identifier 
flags    = Flags for stream creation. See cudaStreamCreateWithFlags for a list of 
           valid flags that can be passed 
priority = Priority of the stream. Lower numbers represent higher priorities. See  
           cudaDeviceGetStreamPriorityRange for more information about the 
           meaningful stream priorities that can be passed.

并发CUDA内核执行的优先级

1 个答案: