在循环的每次迭代中,我正在计算MATLAB矩阵。这些矩阵必须连接在一起以创建一个最终矩阵。在进入循环之前我知道这个最终矩阵的维数,所以我使用'零'函数预先分配矩阵比初始化一个空数组要快,然后在循环的每次迭代中简单地附加子数组。奇怪的是,当我预分配时,我的程序运行得慢得多。这是代码(只有第一行和最后一行不同):
这很慢:
w_cuda = zeros(w_rows, w_cols, f_cols);
for j=0:num_groups-1
% gets # of rows & cols in W. The last group is a special
% case because it may have fewer than max_row_size rows
if (j == num_groups-1 && mod(w_rows, max_row_size) ~= 0)
num_rows_sub = w_rows - (max_row_size * j);
else
num_rows_sub = max_row_size;
end;
% calculate correct W and f matrices
start_index = (max_row_size * j) + 1;
end_index = start_index + num_rows_sub - 1;
w_sub = W(start_index:end_index,:);
f_sub = filterBank(start_index:end_index,:);
% Obtain sub-matrix
w_cuda_sub = nopack_cu(w_sub,f_sub);
% Incorporate sub-matrix into final matrix
w_cuda(start_index:end_index,:,:) = w_cuda_sub;
end
这很快:
w_cuda = [];
for j=0:num_groups-1
% gets # of rows & cols in W. The last group is a special
% case because it may have fewer than max_row_size rows
if (j == num_groups-1 && mod(w_rows, max_row_size) ~= 0)
num_rows_sub = w_rows - (max_row_size * j);
else
num_rows_sub = max_row_size;
end;
% calculate correct W and f matrices
start_index = (max_row_size * j) + 1;
end_index = start_index + num_rows_sub - 1;
w_sub = W(start_index:end_index,:);
f_sub = filterBank(start_index:end_index,:);
% Obtain sub-matrix
w_cuda_sub = nopack_cu(w_sub,f_sub);
% Incorporate sub-matrix into final matrix
w_cuda = [w_cuda; w_cuda_sub];
end
就其他可能有用的信息而言 - 我的矩阵是3D,其中的数字很复杂。与往常一样,任何见解都值得赞赏。
答案 0 :(得分:7)
我一直认为预分配对于任何数组大小都更快,并且从未实际测试过它。因此,我通过附加和预分配方法使用1000次迭代对1x1x3到20x20x3的各种数组大小进行了简单的测试计时。这是代码:
arraySize = 1:20;
numIteration = 1000;
timeAppend = zeros(length(arraySize), 1);
timePreAllocate = zeros(length(arraySize), 1);
for ii = 1:length(arraySize);
w = [];
tic;
for jj = 1:numIteration
w = [w; rand(arraySize(ii), arraySize(ii), 3)];
end
timeAppend(ii) = toc;
end;
for ii = 1:length(arraySize);
w = zeros(arraySize(ii) * numIteration, arraySize(ii), 3);
tic;
for jj = 1:numIteration
indexStart = (jj - 1) * arraySize(ii) + 1;
indexStop = indexStart + arraySize(ii) - 1;
w(indexStart:indexStop,:,:) = rand(arraySize(ii), arraySize(ii), 3);
end
timePreAllocate(ii) = toc;
end;
figure;
axes;
plot(timeAppend);
hold on;
plot(timePreAllocate, 'r');
legend('Append', 'Preallocate');
以下是(如预期的)结果: