我在matlab中有一组n维代表向量。我必须将来自一组训练向量的向量分组到基于邻近度的代表向量表示的组。我该怎么办?
答案 0 :(得分:4)
您可以使用dsearchn
查找哪个代表最接近每个点。我建议首先尝试不涉及三角测量矩阵的版本。如果内存或CPU性能不够好,请查看三角测量内容。
答案 1 :(得分:1)
Iif由n维向量表示n维点的有序列表(这是我对你想要的理解),然后我在过去使用平均最近距离完成了这个。基本上对于矢量一上的每个点,找到到矢量二上的点的最小距离。然后,两个矢量之间的距离是所有这些距离的平均值。然而,这不是对称的,因此您应该对向量2上的每个点执行相同的过程,找到与向量1的最小距离,然后使用最小值,最大值或平均值等聚合两个均值...
这是我使用循环制作的一些代码(用于3d矢量):
function mcd = MCD(fiber1, fiber2, option)
%
%remove NaNs
fiber1(find(isnan(fiber1),1):length(fiber1),:) = [];
fiber2(find(isnan(fiber2),1):length(fiber2),:) = [];
dist = 0;
for k = 1:length(fiber1)
D = [];
for j = 1:length(fiber2)
D = [D distance(fiber1(k,:),fiber2(j,:))];
end;
dist = dist + min(D);
end;
mcd = dist / length(fiber1);
if nargin > 2
dist = 0;
for k = 1:length(fiber2)
D = [];
for j = 1:length(fiber1)
D = [D distance(fiber2(k,:),fiber1(j,:))];
end;
dist = dist + min(D);
end;
mcd2 = dist / length(fiber2);
if strcmp(option,'mean')
mcd = mean([mcd mcd2]);
elseif strcmp(option,'min')
mcd = min([mcd mcd2]);
end;
end;
但这对我来说太慢了。所以这是一个非常快速的矢量化(但很难遵循)版本:
function mcd = MCD(fiber1, fiber2, option, sampling)
%MCD(fiber1, fiber2)
%MCD(fiber1, fiber2, option)
%MCD(fiber1, fiber2, option, sampling)
%remove NaNs
fiber1(find(isnan(fiber1),1):length(fiber1),:) = [];
fiber2(find(isnan(fiber2),1):length(fiber2),:) = [];
%sample the fibers for speed. Each fiber is represented by "sampling"
%number of points.
if nargin == 4
freq = round(length(fiber1)/sampling);
fiber1 = fiber1(1:freq:length(fiber1),:);
freq = round(length(fiber2)/sampling);
fiber2 = fiber2(1:freq:length(fiber2),:);
end;
%reshape to optimize the use of distance() for speed
FIBER2 = reshape(fiber2',[1,3,length(fiber2)]);
FIBER1 = reshape(fiber1',[1,3,length(fiber1)]); %this is only used in the symmetrical case, i.e when 'min' or 'mean' option is called
%reshape amd tile filber 1 so as to eliminate the need for two nested for
%loops thus greatly increasing the computational efficiency. The goal is to
%have a 4D matrix with 1 row and 3 columns. Dimension 3 is a smearing of
%these columns to be as long as fiber2 so that each vector (1x3) in fiber1
%can be placed "on top" as in a row above the whole of fiber2. Thus dim 3
%is as long as fiber2 and dim 4 is as long as fiber1.
fiber1 = reshape(fiber1',[1,3,length(fiber1)]); %1x3xF1
fiber1 = repmat(fiber1,[length(FIBER2),1,1]); %F2x3xF1
fiber1 = permute(fiber1,[2,1,3]); %3xF2xF1
fiber1 = reshape(fiber1,[1,3,length(FIBER2),length(FIBER1)]);%1,3,F2,F1
mcd = mean(min(distance(fiber1, repmat(FIBER2,[1,1,1,length(FIBER1)]))));
if nargin > 2
fiber2 = reshape(fiber2',[1,3,length(fiber2)]); %1x3xF1
fiber2 = repmat(fiber2,[length(FIBER1),1,1]); %F2x3xF1
fiber2 = permute(fiber2,[2,1,3]); %3xF2xF1
fiber2 = reshape(fiber2,[1,3,length(FIBER1),length(FIBER2)]);%1,3,F2,F1
mcd2 = mean(min(distance(fiber2, repmat(FIBER1,[1,1,1,length(FIBER2)]))));
if strcmp(option,'mean')
mcd = mean([mcd mcd2]);
elseif strcmp(option,'min')
mcd = min([mcd mcd2]);
end;
end;
这是我用于上面的distance()函数,在我的例子中我使用欧几里德距离,但你可以将它调整到最适合你的东西,只要它可以接受两个向量:
function Edist = distance(vector1,vector2)
%distance(vector1,vector2)
%
%provides the Euclidean distance between two input vectors. Vector1 and
%vector2 must be row vectors of the same length. The number of elements in
%each vector is the dimnesionality thereof.
Edist = sqrt(sum((diff([vector1;vector2])).^2));