Question

这是我们在每个地方找到的标准示例代码之一...

import time
import numpy

import pycuda.gpuarray as gpuarray
import pycuda.cumath as cumath
import pycuda.autoinit

size = 1e7

t0 = time.time()
x = numpy.linspace(1, size, size).astype(numpy.float32)
y = numpy.sin(x)
t1 = time.time()

cpuTime = t1-t0
print(cpuTime)

t0 = time.time()
x_gpu = gpuarray.to_gpu(x)
y_gpu = cumath.sin(x_gpu)
y = y_gpu.get()
t1 = time.time()

gpuTime = t1-t0
print(gpuTime)

结果是：CPU为200毫秒，GPU为2.45秒......超过10倍

我在胜利10 ...与2015年的PTVS ...

竞争

最好的问候......

斯蒂芬

Answer 1

看起来pycuda在您第一次调用cumath.sin()函数时会引入一些额外的开销（在我的系统上约为400毫秒）。我怀疑这是因为需要为被调用的函数编译CUDA代码。更重要的是，此开销与传递给函数的数组的大小无关。对cumath.sin()的其他调用要快得多，CUDA代码已经编译好使用。在我的系统上，问题中给出的gpu代码大约运行20ms（重复运行），而numpy代码大约需要130ms。

我完全不了解pycuda的内部运作，所以有兴趣听听其他人对此的看法。

为什么我的GPU代码运行速度比cpu慢得多

1 个答案: