Question

我想衡量不同设备的性能，即CPU和GPU。这是我的内核代码：

__kernel void dataParallel(__global int* A)
{  
    sleep(10);
    A[0]=2;
    A[1]=3;
    A[2]=5;
    int pnp;//pnp=probable next prime
    int pprime;//previous prime
    int i,j;
    for(i=3;i<10;i++)
    {
        j=0;
        pprime=A[i-1];
        pnp=pprime+2;
        while((j<i) && A[j]<=sqrt((float)pnp))
        {
            if(pnp%A[j]==0)
                {
                    pnp+=2;
                    j=0;
                }
            j++;

        }
        A[i]=pnp;

    }
}

然而sleep()函数不起作用。我在buildlog中收到以下错误：

<kernel>:4:2: warning: implicit declaration of function 'sleep' is      invalid in C99
    sleep(10);
builtins: link error: Linking globals named '__gpu_suld_1d_i8_trap': symbol multiply defined!

是否有其他方法可以实现该功能。还有一种方法可以记录执行此代码段所需的时间。

P.S。我在主持人代码中加入了#include <unistd.h>。

Answer 1

您不需要在内核中使用sleep来测量执行时间。

有两种方法可以衡量时间。 1.使用opencl内在分析看这里：cl api

获取主机代码中的时间戳，并在执行前后对它们进行比较。例如：

    double start = getTimeInMS();
    //The kernel starts here
    clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, &tasksize, &local_size_in, 0, NULL, NULL)
//wait for kernel execution
clFinish(command_queue);
cout << "kernel execution time " << (getTimeInMS() - start) << endl;

其中getTimeinMs（）是一个返回miliseconds的double值的函数：（特定于Windows，如果你不使用windows，则覆盖其他实现）

static inline double getTimeInMS(){

SYSTEMTIME st;
GetLocalTime(&st);

return (double)st.wSecond * (double)1000 + (double)st.wMilliseconds;}

你也想：

#include <time.h>

对于Mac来说，它可以（也可以在Linux上运行，不确定）：

 static inline double getTime() {
    struct timeval starttime;
    gettimeofday(&starttime, 0x0);


    return (double)starttime.tv_sec * (double)1000 + (double)starttime.tv_usec / (double)1000;}

在OpenCL C中实现sleep（）

1 个答案: