在群集kmeans数据上显示行

时间:2012-07-08 15:46:03

标签: matlab statistics plot cluster-analysis k-means

您好我想知道何时在图形屏幕上对数据进行聚类是否有办法在滚动时显示数据点所属的行?

enter image description here

从上图中我希望有一种方法,如果我选择或滚动点,我可以告诉它属于哪一行。

以下是代码:

%% dimensionality reduction 
columns = 6
[U,S,V]=svds(fulldata,columns);
%% randomly select dataset
rows = 1000;
columns = 6;

%# pick random rows
indX = randperm( size(fulldata,1) );
indX = indX(1:rows);

%# pick random columns
indY = randperm( size(fulldata,2) );
indY = indY(1:columns);

%# filter data
data = U(indX,indY);
%% apply normalization method to every cell
data = data./repmat(sqrt(sum(data.^2)),size(data,1),1);

%% generate sample data
K = 6;
numObservarations = 1000;
dimensions = 6;

%% cluster
opts = statset('MaxIter', 100, 'Display', 'iter');
[clustIDX, clusters, interClustSum, Dist] = kmeans(data, K, 'options',opts, ...
'distance','sqEuclidean', 'EmptyAction','singleton', 'replicates',3);

%% plot data+clusters
figure, hold on
scatter3(data(:,1),data(:,2),data(:,3), 5, clustIDX, 'filled')
scatter3(clusters(:,1),clusters(:,2),clusters(:,3), 100, (1:K)', 'filled')
hold off, xlabel('x'), ylabel('y'), zlabel('z')

%% plot clusters quality
figure
[silh,h] = silhouette(data, clustIDX);
avrgScore = mean(silh);

%% Assign data to clusters
% calculate distance (squared) of all instances to each cluster centroid
D = zeros(numObservarations, K);     % init distances
for k=1:K
%d = sum((x-y).^2).^0.5
D(:,k) = sum( ((data - repmat(clusters(k,:),numObservarations,1)).^2), 2);
end

% find  for all instances the cluster closet to it
[minDists, clusterIndices] = min(D, [], 2);

% compare it with what you expect it to be
sum(clusterIndices == clustIDX)

或者可能是群集数据的输出方法,归一化并重新组织为原始格式,末尾列上的appedicies与原始“fulldata”属于哪一行。

1 个答案:

答案 0 :(得分:5)

您可以使用data cursors功能,当您从绘图中选择一个点时,该功能会显示工具提示。您可以使用修改后的更新功能显示有关所选点的各种信息。

这是一个有效的例子:

function customCusrorModeDemo()
    %# data
    D = load('fisheriris');
    data = D.meas;
    [clustIdx,labels] = grp2idx(D.species);
    K = numel(labels);
    clr = hsv(K);

    %# instance indices grouped according to class
    ind = accumarray(clustIdx, 1:size(data,1), [K 1], @(x){x});

    %# plot
    %#gscatter(data(:,1), data(:,2), clustIdx, clr)
    hLine = zeros(K,1);
    for k=1:K
        hLine(k) = line(data(ind{k},1), data(ind{k},2), data(ind{k},3), ...
            'LineStyle','none', 'Color',clr(k,:), ...
            'Marker','.', 'MarkerSize',15);
    end
    xlabel('SL'), ylabel('SW'), zlabel('PL')
    legend(hLine, labels)
    view(3), box on, grid on

    %# data cursor
    hDCM = datacursormode(gcf);
    set(hDCM, 'UpdateFcn',@updateFcn, 'DisplayStyle','window')
    set(hDCM, 'Enable','on')

    %# callback function
    function txt = updateFcn(~,evt)
        hObj = get(evt,'Target');   %# line object handle
        idx = get(evt,'DataIndex'); %# index of nearest point

        %# class index of data point
        cIdx = find(hLine==hObj, 1, 'first');

        %# instance index (index into the entire data matrix)
        idx = ind{cIdx}(idx);

        %# output text
        txt = {
            sprintf('SL: %g', data(idx,1)) ;
            sprintf('SW: %g', data(idx,2)) ;
            sprintf('PL: %g', data(idx,3)) ;
            sprintf('PW: %g', data(idx,4)) ;
            sprintf('Index: %d', idx) ;
            sprintf('Class: %s', labels{clustIdx(idx)}) ;
        };
    end

end

以下是2D和3D视图(具有不同显示样式)的样子:

screenshot_2D screenshot_3D