同时执行流和主机功能

时间:2018-11-15 10:06:24

标签: cuda

我写了一个有两个流的程序。两个流都对某些数据进行操作,然后将结果写回到主机内存中。 这是我执行此操作的通用结构:

loop {
AsyncCpy(....,HostToDevice,Stream1);
AsyncCpy(....,HostToDevice,Stream2);

Kernel<<<...,Stream1>>>
Kernel<<<...,Stream2>>>

/* Write the results on the host memory */
AsyncCpy(....,DeviceToHost,Stream1);  
AsyncCpy(....,DeviceToHost,Stream2);  
}

当我知道StreamX完成将结果复制回主机内存后,我想在CPU上做一些工作。同时,我不想停止执行Async操作(memcpy或内核执行)的循环。

如果我插入主机函数,可以这样说host_ftn1(..)和host_ftn2(..)

loop {
AsyncCpy(....,HostToDevice,Stream1);
AsyncCpy(....,HostToDevice,Stream2);

Kernel<<<...,Stream1>>>
Kernel<<<...,Stream2>>>

/* Write the results on the host memory to be processed by host_ftn1(..) */
AsyncCpy(....DeviceToHost,Stream1);
/* Write the results on the host memory to be processed by host_ftn2(..) */
AsyncCpy(....DeviceToHost,Stream2);  

if(Stream1 results are copied to host)
       host_ftn1(..);
if(Stream2 results are copied to host)
       host_ftn2(..);
}

它将停止循环执行,直到完成主机功能(即host_ftn1和host_ftn2)的执行为止,但我不想停止执行GPU指令,即AsyncCpy(..)和当CPU忙于执行主机功能(即host_ftn1(..)和host_ftn2(..)

)时,内核<< ..,StreamX >>>

有关此问题的任何解决方案/方法

1 个答案:

答案 0 :(得分:0)

正如 huseyin tugrul buyukisik 所建议的,stream callback在这种情况下起作用。我已经测试了两个流。

最终设计如下:-

loop {
AsyncCpy(....,HostToDevice,Stream1);
AsyncCpy(....,HostToDevice,Stream2);

Kernel<<<...,Stream1>>>
Kernel<<<...,Stream2>>>

/* Write the results on the host memory to be processed by host_ftn1(..) */
AsyncCpy(....DeviceToHost,Stream1);
/* Write the results on the host memory to be processed by host_ftn2(..) */
AsyncCpy(....DeviceToHost,Stream2);  

callback1(..);    // Work to be done on the host once stream1 completes
callback2(..);    // Work to be done on the host once stream2 completes
}

请参见Stream Callbacks