我正在尝试通过本教程学习Numba中CUDA的基础知识 https://github.com/harrism/numba_examples/blob/master/mandelbrot_numba.ipynb
预期的行为是,不使用JIT或CUDA的版本将花费较长的时间,使用JIT的版本将花费更短的时间,而使用CUDA的版本则花费更短的时间。
它可以从普通版本升级到JIT版本,但是CUDA版本平均花费的时间是JIT版本的两倍。 我使用的笔记本电脑是NVidia Geforce GTX 950M。 我想知道问题是由于我的gpu不够强大引起的,还是因为我的程序编写不正确。
这是我的版本:
无:https://pastebin.com/hczvLC8F
import numpy as np
from pylab import imshow, show
from timeit import default_timer as timer
def mandel(x, y, max_iters):
c = complex(x, y)
z = 0.0j
for i in range(max_iters):
z = z*z + c
if (z.real*z.real + z.imag*z.imag) >= 4:
return i
return max_iters
def create_fractal(min_x, max_x, min_y, max_y, image, iters):
height = image.shape[0]
width = image.shape[1]
pixel_size_x = (max_x - min_x) / width
pixel_size_y = (max_y - min_y) / height
for x in range(width):
real = min_x + x * pixel_size_x
for y in range(height):
imag = min_y + y * pixel_size_y
color = mandel(real, imag, iters)
image[y, x] = color
image = np.zeros((1024, 1536), dtype = np.uint8)
start = timer()
create_fractal(-2.0, 1.0, -1.0, 1.0, image, 20)
dt = timer() - start
print("Mandelbrot created in {} s".format(dt))
imshow(image)
show()
准时:https://pastebin.com/NStX7MVi
@jit
def mandel(x, y, max_iters):
@jit
def create_fractal(min_x, max_x, min_y, max_y, image, iters):
CUDA:https://pastebin.com/4V3BgdAv
mandel_gpu = cuda.jit(device=True)(mandel)
@cuda.jit
def mandel_kernel(min_x, max_x, min_y, max_y, image, iters):
height = image.shape[0]
width = image.shape[1]
pixel_size_x = (max_x - min_x) / width
pixel_size_y = (max_y - min_y) / height
startX, startY = cuda.grid(2)
gridX = cuda.gridDim.x * cuda.blockDim.x
gridY = cuda.gridDim.y * cuda.blockDim.y
for x in range(startX, width, gridX):
real = min_x + x * pixel_size_x
for y in range(startY, height, gridY):
imag = min_y + y * pixel_size_y
image[y, x] = mandel_gpu(real, imag, iters)
gimage = np.zeros((1024, 1536), dtype = np.uint8)
blockdim = (32, 8)
griddim = (32,16)
start = timer()
d_image = cuda.to_device(gimage)
mandel_kernel[griddim, blockdim](-2.0, 1.0, -1.0, 1.0, d_image, 20)
d_image.to_host()
dt = timer() - start
我原本以为CUDA版本比JIT版本要快,但实际上它要花两倍的时间。
我多次运行它们,这是笔记本电脑上的结果:
None: 6.24s on average
JIT: 0.42s on average
CUDA: 0.86s on average