Question

我正在尝试优化代码的性能（例如速度）。我是矢量化的新手，并尝试自己进行矢量化，但不成功（也尝试bxsfun，parfor，某种矢量化等）。任何人都可以帮我优化这段代码，并简要说明如何做到这一点？

% for simplify, create dummy data
Z = rand(250,1)
z1 = rand(100,100)
z2 = rand(100,100)

%update missing param on the last updated, thanks @Bas Swinckels and @Daniel R
j = 2;
n = length(Z);
h = 0.4;


tic
[K1, K2] = size(z1);
result = zeros(K1,K2);

for l = 1 : K1
    for m = 1: K2
        result(l,m) = sum(K_h(h, z1(l,m), Z(j+1:n)).*K_h(h, z2(l,m), Z(1:n-j)));    
    end
end

result = result ./ (n-j);
toc

K_h.m函数是边界内核并定义为（x是标量，y可以是向量）

function res = K_h(h, x,y)
 res = 0;

 if ( x >= 0 & x < h)
    denominator = integral(@kernelFunc,-x./h,1);  
    res = 1./h.*kernelFunc((x-y)/h)/denominator;
 elseif (x>=h & x <= 1-h)
    res = 1./h*kernelFunc((x-y)/h);
 elseif (x > 1 - h & x <= 1)
    denominator = integral(@kernelFunc,-1,(1-x)./h);
    res = 1./h.*kernelFunc((x-y)/h)/denominator;
 else    
    fprintf('x is out of [0,1]');
    return;
 end
end

获得结果需要很长时间：\经过的时间是13.616413秒。

谢谢。欢迎任何评论。 P / S：抱歉我缺少英语

Answer 1

一些观察：似乎Z(j+1:n))和Z(1:n-j)在循环内是常量，因此循环之前的索引操作也是如此。接下来，似乎循环非常简单，每个result(l, m)取决于z1(l, m)和z2(l, m)。这是使用arrayfun的理想情况。解决方案可能看起来像这样（未经测试）：

tic

% do constant stuff outside of the loop
Zhigh = Z(j+1:n);
Zlow = Z(1:n-j);

result = arrayfun(@(zz1, zz2) sum(K_h(h, zz1, Zhigh).*K_h(h, zz2, Zlow)), z1, z2)

result = result ./ (n-j);
toc

我不确定这是否会更快，因为我认为运行时间不会受到for循环的支配，而是由K_h函数内完成的所有工作。

在matlab中优化嵌套for循环

1 个答案: