Question

我一直在研究clFFT库AMD Radeon R7 260x的性能评估。 CPU是intel xeon里面，OS是centOS。

我一直在研究具有不同批处理模式（并行FFT）的2D 16x16 clFFT的性能。我想知道从特别是事件分析和gettimeofday获得的不同结果。

具有不同批处理模式的2D 16x16 clFFT的结果如下，

使用EventProfiling：

batch  kernel exec time(us)

   1    320.7
  16    461.1
 256    458.3
 512    537.7
1024    1016.8

此处，批处理表示并行FFT，内核执行时间表示以微秒为单位的执行时间。

使用gettimeofday

batch  HtoD(us)  kernelExecTime(us) DtoH(us)

   1    29653      10850             39227
  16    28313      10786             32474
 256    26995      11167             39672
 512    26145      10773             32273
1024    26856      11948             31060

这里，批次表示并行FFT， H到D 表示从主机到设备的数据传输时间，内核执行时间表示内核执行时间和 D到H 表示从设备到主机的数据传输时间，所有这些都是微秒。

（很抱歉，因为我无法以良好的表格格式向您显示结果，我无法在此处添加表格。希望您仍然可以阅读）。这是我的问题，

1a）为什么从EventProfiling获得的内核时间与gettimeofday完全不同？

1b）这里的另一个问题是，哪些结果是正确的？

2）数据（w.r.t尺寸）转移随批量增加而增加。从gettimeofday的结果来看，数据传输时间 H到D 或 D到H 几乎是不变的，而不是随着批量大小从1增加到1024而增长那是为什么？

clFinish( cl_queue);

// Copy data from host to device
gettimeofday(&t_s_gpu1, NULL);
clEnqueueWriteBuffer( cl_queue, d_data, CL_TRUE, 0, width*height*batchSize*sizeof(cl_compl_flt), h_src, 0, NULL, &event1);
clFinish( cl_queue);
clWaitForEvents(1, &event1); 
gettimeofday(&t_e_gpu1, NULL);

checkCL( clAmdFftBakePlan( fftPlan, 1, &cl_queue, NULL, NULL) );

clAmdFftSetPlanBatchSize( fftPlan, batchSize );
clFinish( cl_queue);

gettimeofday(&t_s_gpu, NULL);
checkCL( clAmdFftEnqueueTransform( fftPlan, CLFFT_FORWARD, 1, &cl_queue, 0, NULL, &event, &d_data, NULL, NULL) );
clFinish( cl_queue);
clWaitForEvents(1, &event); 
gettimeofday(&t_e_gpu, NULL);

clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START, sizeof(time_start), &time_start, NULL);
clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(time_end), &time_end, NULL);

totaltime=totaltime+time_end - time_start;
clFinish( cl_queue);

// Copy result from device to host

gettimeofday(&t_s_gpu2, NULL);
checkCL( clEnqueueReadBuffer(cl_queue, d_data, CL_TRUE, 0, width*height*batchSize*sizeof(cl_compl_flt), h_res, 0, NULL, &event2));
clFinish( cl_queue);
clWaitForEvents(1, &event2); 
gettimeofday(&t_e_gpu2, NULL);

我会提前找你的意见和答案以及感谢。

clFFT绩效评估

0 个答案: