在没有循环的MATLAB中计算Gramm矩阵

时间:2012-10-28 14:47:19

标签: matlab svm

我有一个矩阵X(10000, 800)。我想计算gramm矩阵K(10000,10000),其中K(i,j)= exp(-(X(i,:)-X(j,:))^2)

首先我使用了双循环,但它只是永远挂起。 然后我尝试了这个:

[N d] = size(X);
aa = repmat(X',[1 N]);
bb = repmat(reshape(X',1,[]),[N 1]);
K = reshape((aa-bb).^2, [N*N d]);
K = reshape(sum(D,2),[N N]);

但是它使用了很多额外的空间,我很快就会耗尽内存。 有没有任何有效的矢量化方法。 我确信必定有一些东西,因为这是许多内核svms以及图像处理的标准中间步骤。

2 个答案:

答案 0 :(得分:3)

使用pdist2或pdist。请注意,来自Matlab的pdist2只是slow ... Matlab cosine distance is way slow

代码:

X = rand(100, 3);
K = squareform(pdist(X, 'euclidean'));
K = exp(-K.^2);

我会为更一般的情况编写这个,你有两个矩阵并且你想找到所有的距离。 (x-y)^2 = x'x - 2x'y + y'y如果您想计算Gram矩阵,则需要所有差异组合。

X = rand(100, 3);
Y = rand(50, 3);
A = sum(X .* X, 2);
B = -2 *X * Y';
C = sum(Y .* Y, 2);
K = bsxfun(@plus, A, B);
K = bsxfun(@plus, K, C);
K = exp(-K);

编辑:速度比较

代码

% http://stackoverflow.com/questions/13109826/compute-a-gramm-matrix-in-matlab-without-loops/24407122#24407122
function time_gramm()
% I have a matrix X(10000, 800). I want to compute gramm matrix K(10000,10000), where K(i,j)= exp(-(X(i,:)-X(j,:))^2).
X = rand(100, 800);

%% The straight-forward pdist solution.
tic;
K = squareform(pdist(X, 'euclidean'));
K1 = exp(-K .^2);
t1 = toc;
fprintf('pdist took \t%d seconds\n', t1);

%% The vectorized solution
tic;
A = sum(X .* X, 2);
B = -2 * X * X';
K = bsxfun(@plus, A, B);
K = bsxfun(@plus, K, A');
K2 = exp(-K);
t2 = toc;
fprintf('Vectorized solution took \t%d seconds.\n', t2);

%% The not-so-smart triple-loop solution
tic;
N = size(X, 1);
K3 = zeros(N, N);
for i=1:N
    %     fprintf('Running outer loop for i= %d\n', i);
    for j=1:N
        xij = X(i,:) - X(j,:);
        xij = norm(xij, 2);
        xij = xij ^ 2;
        K3(i,j) = -xij;
        %         d = X(i,:) - X(j,:); % Alternative way, twice as fast but still
        %         orders of magnitude slower than the other solutions.
        %         K3(i,j) = exp(-d * d');
    end
end
K3 = exp(K3);
t3 = toc;
fprintf('Triple-nested loop took \t%d seconds\n', t3);
%% Assert results are the same...
assert(all(abs(K1(:) - K2(:)) < 1e-6 ));
assert(all(abs(K1(:) - K3(:)) < 1e-6 ));
end

结果

我用N = 100

运行上面的代码
pdist took  8.600000e-03 seconds
Vectorized solution took    3.916000e-03 seconds.
Triple-nested loop took     2.699330e-01 seconds

请注意,在请求的问题大小的第100位,另一个答案(O(m^2 n))中建议的代码的性能要慢两个数量级。到那时,我插入了100k作为X矩阵的大小,花了很多,比我等待的时间长得多。

完整版问题(X = rand(10000, 800))的表现如下:

pdist took  5.470632e+01 seconds
Vectorized solution took    1.141894e+01 seconds.

注释

矢量化解决方案需要11秒,Matlab的pdist需要55秒,而另一个样本中建议的手动解决方案从未完成。

答案 1 :(得分:0)

为什么不使用简单的公式呢? 对于元素K(i, j) = exp( sum_{k=0}^{n} (X(i, k) - X(j, k))^2。 所以,这是ij的两个外部循环以及k的内部循环。时间复杂度为O(m^2 n)m中有n行和X列。空间复杂度为O(1),因为您不再使用XK矩阵之外的空间来计算答案。

你试过这个吗?它真的很慢吗?