BigList = rand(20, 3)
LittleList = rand(5, 3)
我想在大列表中的每一行找到最近的'小列表中的行,由欧几里德范数定义(即k = 3维中相应值之间的平方距离之和)。
我可以看到如何使用两个循环来做到这一点,但似乎应该有一种更好的方法来使用内置矩阵运算来做到这一点。
答案 0 :(得分:6)
正确的方法当然是使用nearest-neighbor searching algorithms 但是,如果您的维度不是太高而且您的数据集不是很大,那么只需使用bsxfun:
d = bsxfun( @minus, permute( bigList, [1 3 2] ), permute( littleList, [3 1 2] ) ); %//diff in third dimension
d = sum( d.^2, 3 ); %// sq euclidean distance
[minDist minIdx] = min( d, [], 2 );
除了提出的here矩阵乘法方法之外,还有另一个没有循环的矩阵乘法
nb = sum( bigList.^2, 2 ); %// norm of bigList's items
nl = sum( littleList.^2, 2 ); %// norm of littleList's items
d = bsxfun( @sum, nb, nl.' ) - 2 * bigList * littleList'; %// all the distances
这种方法背后的观察是欧几里德距离(L2范数)
|| a - b ||^2 = ||a||^2 + ||b||^2 - 2<a,b>
<a,b>
是两个向量的点积。
答案 1 :(得分:4)
您可以使用bsxfun
:
d = squeeze(sum((bsxfun(@minus, BigList, permute(LittleList, [3 2 1]))).^2, 2));
[~, ind] = min(d,[],2);
答案 2 :(得分:4)
内置MATLAB函数 pdist2
,找到"Pairwise distance between two sets of observations"
。有了它,您可以计算欧氏距离矩阵,然后找到距离矩阵中适当维度的最小值索引,它代表bigList
中每行littleList
的“最接近”。
这是带衬里的单行 -
[~,minIdx] = min(pdist2(bigList,littleList),[],2); %// minIdx is what you are after
如果你关心性能,这里有一个利用fast matrix multiplication in MATLAB
的方法
此处提供的大部分代码均来自this smart solution。
dim = 3;
numA = size(bigList,1);
numB = size(littleList,1);
helpA = zeros(numA,3*dim);
helpB = zeros(numB,3*dim);
for idx = 1:dim
helpA(:,3*idx-2:3*idx) = [ones(numA,1), -2*bigList(:,idx), bigList(:,idx).^2 ];
helpB(:,3*idx-2:3*idx) = [littleList(:,idx).^2 , littleList(:,idx), ones(numB,1)];
end
[~,minIdx] = min(helpA * helpB',[],2); %//'# minIdx is what you are after
基准代码 -
N1 = 1750; N2 = 4*N1; %/ datasize
littleList = rand(N1, 3);
bigList = rand(N2, 3);
for k = 1:50000
tic(); elapsed = toc(); %// Warm up tic/toc
end
disp('------------- With squeeze + bsxfun + permute based approach [LuisMendo]')
tic
d = squeeze(sum((bsxfun(@minus, bigList, permute(littleList, [3 2 1]))).^2, 2));
[~, ind] = min(d,[],2);
toc, clear d ind
disp('------------- With double permutes + bsxfun based approach [Shai]')
tic
d = bsxfun( @minus, permute( bigList, [1 3 2] ), permute( littleList, [3 1 2] ) ); %//diff in third dimension
d = sum( d.^2, 3 ); %// sq euclidean distance
[~,minIdx] = min( d, [], 2 );
toc
clear d minIdx
disp('------------- With bsxfun + matrix-multiplication based approach [Shai]')
tic
nb = sum( bigList.^2, 2 ); %// norm of bigList's items
nl = sum( littleList.^2, 2 ); %// norm of littleList's items
d = bsxfun(@plus, nb, nl.' ) - 2 * bigList * littleList'; %// all the distances
[~,minIdx] = min(d,[],2);
toc, clear nb nl d minIdx
disp('------------- With matrix multiplication based approach [Divakar]')
tic
dim = 3;
numA = size(bigList,1);
numB = size(littleList,1);
helpA = zeros(numA,3*dim);
helpB = zeros(numB,3*dim);
for idx = 1:dim
helpA(:,3*idx-2:3*idx) = [ones(numA,1), -2*bigList(:,idx), bigList(:,idx).^2 ];
helpB(:,3*idx-2:3*idx) = [littleList(:,idx).^2 , littleList(:,idx), ones(numB,1)];
end
[~,minIdx] = min(helpA * helpB',[],2);
toc, clear dim numA numB helpA helpB idx minIdx
disp('------------- With pdist2 based approach [Divakar]')
tic
[~,minIdx] = min(pdist2(bigList,littleList),[],2);
toc, clear minIdx
基准测试结果 -
------------- With squeeze + bsxfun + permute based approach [LuisMendo]
Elapsed time is 0.718529 seconds.
------------- With double permutes + bsxfun based approach [Shai]
Elapsed time is 0.971690 seconds.
------------- With bsxfun + matrix-multiplication based approach [Shai]
Elapsed time is 0.328442 seconds.
------------- With matrix multiplication based approach [Divakar]
Elapsed time is 0.159092 seconds.
------------- With pdist2 based approach [Divakar]
Elapsed time is 0.310850 seconds.
快速结论:Shai第二种方法的运行时间是bsxfun
和矩阵乘法的组合,与基于pdist2
的方法非常接近,没有明显的赢家可以在这两者之间做出决定。