多个特斯拉K80 GPU和parfor循环

时间:2016-09-07 01:42:08

标签: matlab multi-gpu

我收到了一台装有4xGPU的Tesla K80的计算机,我正在尝试使用Matlab PCT的parfor循环来加速FFT的计算,而且速度还慢。

以下是我的尝试:

% Pupil is based on a 512x512 array

    parfor zz = 1:4
        gd = gpuDevice;
        d{zz} = gd.Index;
        probe{zz} = gpuArray(pupil); 
        Essai{zz} = gpuArray(pupil); 
    end

    tic;
    parfor ii = 1:4
        gd2 = gpuDevice;
        d2{ii} = gd2.Index;
        for i = 1:100
        [Essai{ii}] = fftn(probe{ii});
        end
    end
    toc
    %%

Starting parallel pool (parpool) using the 'local' profile ... connected to 4 workers.
Elapsed time is 1.805763 seconds.
Elapsed time is 1.412928 seconds.
Elapsed time is 1.409559 seconds.

Starting parallel pool (parpool) using the 'local' profile ... connected to 8 workers.
Elapsed time is 0.606602 seconds.
Elapsed time is 0.297850 seconds.
Elapsed time is 0.294365 seconds.
%%
tic; for i = 1:400; Essai{1} = fftn( probe{1} ); end; toc
Elapsed time is 0.193579 seconds !!!

为什么开放8名工作人员原则上更快,我只将变量存储到4gpu中(8个)?

另外,如何将Tesla K80用作单个GPU?

Merci,Nicolas

2 个答案:

答案 0 :(得分:1)

我怀疑parfor适用于多GPU系统。如果速度至关重要并且您想要充分利用GPU,我建议您使用cuFFT库编写自己的小CUDA脚本: http://docs.nvidia.com/cuda/cufft/#multiple-GPU-cufft-transforms

以下是如何编写包含CUDA代码的mex文件: http://www.mathworks.com/help/distcomp/run-mex-functions-containing-cuda-code.html

答案 1 :(得分:0)

非常感谢您的快速回复和链接!确实,我试图避免使用CUDA,但它似乎是扩展FFT的最佳选择。 虽然我认为parfor和spmd是多GPU的好工具..