Question

我有一个问题。我试图计算矢量之间的成对距离。我先解释一下这个问题：我有两组向量X和Y。 X有三个向量x1，x2和x3。 Y有三个向量y1，y2和y3。 X和Y中的注释向量分别为m和n。让数据集表示为此图像：

enter image description here

我正在尝试计算相似度矩阵，例如：

enter image description here 。现在解释了不同的颜色编码部分 - 所有那些标有0的单元格都不需要计算。我故意把它作为100（它可以是任何值）。必须计算灰色单元格。相似性得分计算为L2的{{1}} + (xi-xj)范数的L2范数。

这意味着条目

(yi-yj)

我写了一个基本代码来执行此操作：

M((x_i,y_j), (x_k,y_l)) := norm(x_i-x_k,2) + norm(y_j-y_l,2)

对于基质的干燥样品运行：我得到了这些结果 -

clc;clear all;close all;
%% randomly generate data
m=3; n1=4; n2=6;
train_a_mean = rand(m,n1);
train_b_mean = rand(m,n2);
p = size(train_a_mean,1)*size(train_b_mean,1);
score_mean_ab = zeros(p,p);

%% This is to store the index variables 
%% This is required for futu
idx1 = score_mean_ab;
idx2 = idx1; idx3 = idx1; idx4 = idx1;

a=1; b=1;
for i=1:size(score_mean_ab,1)
    c = 1; d = 1;
    for j=1:size(score_mean_ab,2)
        if (a==c)
            score_mean_ab(i,j) = 100;
        else            
            %% computing distances between the different modalities and
            %% summing them up
            score_mean_ab(i,j) = norm(train_a_mean(a,:)-train_a_mean(c,:),2) ...
            + norm(train_b_mean(b,:)-train_b_mean(d,:),2);
        end
        %% saving the indices
        idx1(i,j)=a; idx2(i,j)=b; idx3(i,j)=c; idx4(i,j)=d;        
        %% updating the values of c and d
        if mod(d,size(train_a_mean,1))==0
            c = c + 1;
            d = 1;
        else
            d = d+1;
        end
    end
    %% updating the values of a and b
    if mod(b,size(train_a_mean,1))==0
        a = a + 1;
        b = 1;
    else        
        b = b+1;
    end
end

但是我的代码很慢。我花了很少的样本运行并得到了这些结果：

score_mean_ab =

  100.0000  100.0000  100.0000    0.6700    1.6548    1.5725    0.8154    1.8002    1.7179
  100.0000  100.0000  100.0000    1.6548    0.6700    1.5000    1.8002    0.8154    1.6454
  100.0000  100.0000  100.0000    1.5725    1.5000    0.6700    1.7179    1.6454    0.8154
    0.6700    1.6548    1.5725  100.0000  100.0000  100.0000    1.3174    2.3022    2.2200
    1.6548    0.6700    1.5000  100.0000  100.0000  100.0000    2.3022    1.3174    2.1475
    1.5725    1.5000    0.6700  100.0000  100.0000  100.0000    2.2200    2.1475    1.3174
    0.8154    1.8002    1.7179    1.3174    2.3022    2.2200  100.0000  100.0000  100.0000
    1.8002    0.8154    1.6454    2.3022    1.3174    2.1475  100.0000  100.0000  100.0000
    1.7179    1.6454    0.8154    2.2200    2.1475    1.3174  100.0000  100.0000  100.0000

我的问题：

通常，我的值为m=3; n1=3; n2=3; Elapsed time is 0.000363 seconds. m=10; n1=3; n2=3; Elapsed time is 0.042015 seconds. m=10; n1=1800; n2=1800; Elapsed time is 0.230046 seconds. m=20; n1=1800; n2=1800; Elapsed time is 4.309134 seconds. m=30; n1=1800; n2=1800; Elapsed time is 23.058106 seconds.，m~100和n1~2000。我自己的代码在这一点上崩溃了。有没有优化的方法来做到这一点？
内置的matlab函数pdist2 可以用于此目的吗？

注意：这些向量实际上是行向量的形式，n2~2000和n1的值可能不相等。

Answer 1

这是一种方法。这会计算所有条目。

m = 3;             %// number of (row) vectors in X and in Y
n1 = 3;            %// length of vectors in X
n2 = 3;            %// length of vectors in Y
X = rand(m, n1);   %// random data: X
Y = rand(m, n2);   %// random data: Y

[ii, jj] = ndgrid(1:m); 
U = reshape(sqrt(sum((X(ii,:)-X(jj,:)).^2, 2)), m, m);
V = reshape(sqrt(sum((Y(ii,:)-Y(jj,:)).^2, 2)), m, m);
result = U(ceil(1/m:1/m:m), ceil(1/m:1/m:m)) + repmat(V, m, m);

或者您可以使用bsxfun代替ndgrid：

U = sqrt(sum(bsxfun(@minus, permute(X, [1 3 2]), permute(X, [3 1 2])).^2, 3));
V = sqrt(sum(bsxfun(@minus, permute(Y, [1 3 2]), permute(Y, [3 1 2])).^2, 3));
result = U(ceil(1/m:1/m:m), ceil(1/m:1/m:m)) + repmat(V, m, m);

Answer 2

您可以使用以下方式实现此目的：

m = 3; % Number of vectors in X/Y (must have same number of vectors)
XD = squareform(pdist(X)); %// == pdist2(X,X) but faster
YD = squareform(pdist(Y)); %// == pdist2(Y,Y) but faster
M = kron(XD,ones(m,m)) + repmat(YD,m,m);

请注意，为了使pdist有效，必须将X和Y作为行向量。另外：忽略对角线块。

Answer 3

假设A为train_a_mean而B为train_b_mean，以便在代码中轻松访问，您可以在此处使用两种方法访问最终目的地< / em>，这是输出score_mean_ab的行方向最小索引。

方法＃1

此方法基于bsxfun获取norm及其summations以及获取线性索引以将"diagonal block"元素设置为全部Infs根据问题的要求。这是实施 -

%// Parameter M = m^2; %// Get pairwise norms nm1 = sqrt(sum(bsxfun(@minus,A,permute(A,[3 2 1])).^2,2)); nm2 = sqrt(sum(bsxfun(@minus,B,permute(B,[3 2 1])).^2,2)); %// Get sum of norms and the final values norm_sum = bsxfun(@plus,nm1,permute(nm2,[2 1 4 3])); %// Get "diagonal block" elements and set them to all Infs ind1 = bsxfun(@plus,[1:m:M]',[0:m-1]*(M+1)); %//' ind2 = bsxfun(@plus,ind1(:),[0:m-1]*m^3); norm_sum(ind2) = Inf; [~,min_idx] = min(reshape(norm_sum,m,M,[]),[],2); min_idx = reshape(reshape(min_idx,m,[])',[],1);

方法＃2

这种方法ab（使用）matrix multiplication based distance matrix calculation可能更快的解决方案。代码列在下一个 -

%// Parameters nA = size(A,2); nB = size(B,2); M = m^2; %// Get the pairwise norms for both A and B A_t = A'; %//' norm_a = sqrt([-2*A A.^2 ones(m,nA)]*[A_t ; ones(nA,m) ; A_t.^2]) norm_a(1:m+1:end) = 0; B_t = B'; %//' norm_b = sqrt([-2*B B.^2 ones(m,nB)]*[B_t ; ones(nB,m) ; B_t.^2]) norm_b(1:m+1:end) = 0; %// Norm sums norm_sum = reshape(bsxfun(@plus,norm_a(:).',norm_b(:)),m,m,[]) %//' %// Set the "diagonal blocks" as all Infs norm_sum(:,:,1:m+1:M) = Inf %// Re-arrange into the desired 2D output and get the minimum indices out = reshape(permute(reshape(permute(norm_sum,[1 3 2]),M,m,[]),[1 3 2]),M,M); [~,min_idx] = min(out,[],2);

跨两个模态的交叉成对距离测量

3 个答案: