我正在使用k-means聚类我的数据,但我没有使用标准算法,我使用近似最近邻(ANN)算法来加速样本到中心的比较。这可以通过以下方式轻松完成:
[clusterCenters, trainAssignments] = vl_kmeans(trainDescriptors, clusterCount, 'Algorithm', 'ANN', 'MaxNumComparisons', ceil(clusterCount / 50));
现在,当我运行此代码时,变量' trainDescriptors '被聚类,并且使用ANN将每个描述符分配给' clusterCenters '。
我还有另一个变量' testDescriptors '。我想将它们分配给集群中心。此分配必须使用与“ trainDescriptors ”相同的方法完成,但AFAIK vl_kmeans 函数不会返回它为快速构建的树分配
所以,我的问题是,是否可以将' testDescriptors '分配给' clustersCenters '作为' trainDescriptors '分配给'< em> clusterCenters '在 vl_kmeans 函数中,如果是,我该怎么做?
答案 0 :(得分:4)
好吧,我已经明白了。它可以像下面那样完成:
clusterCount = 1024;
datasetTrain = single(rand(128, 100000));
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 1 - cluster train data and get train assignments
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[clusterCenters, trainAssignments_actual] = vl_kmeans(datasetTrain, clusterCount, ...
'Algorithm', 'ANN', ...
'Distance', 'l2', ...
'NumRepetitions', 1, ...
'NumTrees', 3, ...
'MaxNumComparisons', ceil(clusterCount / 50) ...
);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 2 - assign train data to clusters centers
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
forest = vl_kdtreebuild(clusterCenters, ...
'Distance', 'l2', ...
'NumTrees', 3 ...
);
trainAssignments_expected = vl_kdtreequery(forest, clusterCenters, datasetTrain);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 3 - validate second assignment
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
validation = isequal(trainAssignments_actual, trainAssignments_expected);
在第2步中,我使用群集中心创建新树,然后再次将数据分配给中心。它给出了有效的结果。