Question

我在Linux服务器上运行了我的CUDA代码，RHEL5.3 / Tesla C1060 / CUDA 2.3但它比我预期的要慢得多

然而来自cuda profiler的数据足够快

所以它似乎花了很长时间来加载程序并且时间没有被描述

我是对的吗？

我使用此类代码来测试我是否正确

#include<cuda.h>
#include<cuda_runtime.h>
#include<stdio.h>
#include<time.h>

#define B 1
#define T 1

__global__ void test()
{
}

int main()
{
    clock_t start=clock();
    cudaSetDevice(0);
    test<<<B,T>>>();
    clock_t end=clock();
    printf("time:%dms\n",end-start);
}

并使用命令＆＃34; time＆＃34;以及用于测量代码的clock（）函数时间

nvcc -o test test.cu
time ./test

结果是

time:4s


real 0m3.311s
user 0m0.005s
sys  0m2.837s

在我自己的电脑上，即Win 8 / CUDA5.5 / GT 720M /，相同的代码运行得更快。

Answer 1

那个时代的Linux CUDA驱动程序（可能是185系列IIRC）有一个＆＃34;功能＆＃34;因此，只要没有客户端连接到驱动程序，驱动程序就会卸载多个内部驱动程序组件。对于X11始终处于活动状态的显示GPU，这很少是一个问题，但对于计算GPU而言，它会导致第一次应用程序运行时出现大延迟，同时驱动程序会重新初始化，并丢失设备设置，如计算独占模式，风扇速度，等

正常的解决方案是在守护进程模式下运行nvidia-smi实用程序 - 它充当客户端并阻止驱动程序进行deintialising。像这样：

nvidia-smi --loop-continuously --interval=60 --filename=/var/log/nvidia-smi.log &

以root身份运行应解决问题

在Tesla C1060上加载CUDA程序需要很长时间吗？

1 个答案: