我发生了以下转移:
主机到设备:62504KB
设备到主机:187512KB(主机到设备之间传输的内存的3倍)
我在Kepler K20c上运行,而nvprof正在返回以下内容:
======== Profiling result:
Time(%) Time Calls Avg Min Max Name
68.86 73.59ms 3 24.53ms 24.50ms 24.57ms [CUDA memcpy DtoH]
21.97 23.47ms 3 7.82ms 992ns 23.47ms [CUDA memcpy HtoD]
9.17 9.80ms 3 3.27ms 2.93ms 3.84ms void cuda_multi_gemm_unif_kernel_small_r
将数据字节传入和传出设备的时间是否合理?上述数据传输率如下:
主机到设备传输:2.5398 GB / s
设备到主机传输:2.4300 GB / s
产生以下结果
Device: Tesla K20c
Transfer size (MB): 16
Pageable transfers
Host to Device bandwidth (GB/s): 2.822440
Device to Host bandwidth (GB/s): 1.961510
Pinned transfers
Host to Device bandwidth (GB/s): 5.867341
Device to Host bandwidth (GB/s): 6.629257
======== Profiling result:
Time(%) Time Calls Avg Min Max Name
55.09 10.34ms 2 5.17ms 2.50ms 7.84ms [CUDA memcpy DtoH]
44.91 8.43ms 2 4.21ms 2.80ms 5.63ms [CUDA memcpy HtoD]