主机与设备之间的数据传输速率

时间:2015-07-18 20:16:02

标签: cuda gpgpu

我发生了以下转移:

主机到设备:62504KB
设备到主机:187512KB(主机到设备之间传输的内存的3倍)

我在Kepler K20c上运行,而nvprof正在返回以下内容:

======== Profiling result:  
 Time(%)      Time   Calls       Avg       Min       Max  Name  
   68.86   73.59ms       3   24.53ms   24.50ms   24.57ms  [CUDA memcpy DtoH]  
   21.97   23.47ms       3    7.82ms     992ns   23.47ms  [CUDA memcpy HtoD]  
    9.17    9.80ms       3    3.27ms    2.93ms    3.84ms  void   cuda_multi_gemm_unif_kernel_small_r

将数据字节传入和传出设备的时间是否合理?上述数据传输率如下:

主机到设备传输:2.5398 GB / s
设备到主机传输:2.4300 GB / s

https://github.com/parallel-forall/code-samples/blob/master/series/cuda-cpp/optimize-data-transfers/bandwidthtest.cu

的带宽测试

产生以下结果

Device: Tesla K20c  
Transfer size (MB): 16

Pageable transfers
  Host to Device bandwidth (GB/s): 2.822440
  Device to Host bandwidth (GB/s): 1.961510

Pinned transfers
  Host to Device bandwidth (GB/s): 5.867341
  Device to Host bandwidth (GB/s): 6.629257

======== Profiling result:
 Time(%)      Time   Calls       Avg       Min       Max  Name
  55.09   10.34ms       2    5.17ms    2.50ms    7.84ms  [CUDA memcpy DtoH]
  44.91    8.43ms       2    4.21ms    2.80ms    5.63ms  [CUDA memcpy HtoD]

0 个答案:

没有答案