我想在GPUarray中采用两个矩阵的加权和来快速。例如我在cpu上的代码如下:
mat1 = rand(19,19);
mat2= rand(19,19);
Receptive_fieldsize = [4,3];
overlap = 1;
Output = GetweightedSum(mat1,mat2, Receptive_fieldsize,overlap); %this will output in an 6x6 matrix
我的函数体是:
function Output = GetweightedSum(mat1,mat2, RF,overlap)
gap = RF(1) - overlap;
size_mat = size(mat1);
output_size=[6,6];
for u=1: output_size(1)
for v=1: output_size(2)
min_u = (u - 1) * gap + 1;
max_u = (u - 1) * gap + RF(1);
min_v = (v - 1) * gap + 1;
max_v = (v - 1) * gap + RF(2);
input1 = mat1(min_u:max_u,min_v:max_v);
input2 = mat2(min_u:max_u,min_v:max_v);
Output(u,v) = sum(sum(input1 .*input2));
end
end
如何将其转换为GPUfunciton。我可以直接这样做,或者我可以在GPU代码中使用循环。我对GPU完全不熟悉,所以对此一无所知。 如果有人指导我,或者将上面的代码更改为对GPU功能的参考以便我可以从中学习,那将会感激不尽。 此致
答案 0 :(得分:1)
查看代码及其旁边的注释是否对您有意义 -
function Output = GetweightedSumGPU(mat1,mat2, RF,overlap)
%// Create parameters
gap = RF(1) - overlap;
output_size=[6,6];
sz1 = output_size(1);
sz2 = output_size(2);
nrows = size(mat1,1); %// get number of rows in mat1
%// Copy data to GPU
gmat1 = gpuArray(mat1);
gmat2 = gpuArray(mat2);
start_row_ind = gpuArray([1:RF(1)]'); %//' starting row indices for each block
col_offset = gpuArray([0:RF(2)-1]*nrows); %// column offset for each block
%// Linear indices for each block
ind = bsxfun(@plus,start_row_ind,col_offset);
%// Linear indices along rows and columns respectively
ind_rows = bsxfun(@plus,ind(:),[0:sz1-1]*gap);
ind_rows_cols = bsxfun(@plus,ind_rows,permute([0:sz2-1]*gap*nrows,[1 3 2]));
%// Elementwise multiplication, summing and gathering back result to CPU
Output = gather(reshape(sum(gmat1(ind_rows_cols).*gmat2(ind_rows_cols),1),sz1,sz2));
return;