Question

我正在为给定的4维数据实现k-means算法，其中k =＃of cluster，我使用不同的初始点运行大约5次。

如何计算每次运行后的平方误差总和（SSE）？

＆＃13;

python manage.py scrape

＆＃13;

如果有人能帮助我，我会非常高兴。感谢

Answer 1

kmeans()函数已经直接提供了您想要的所有内容。它具有3个集群的以下语法：

[idx,CentreCoordinates,SEE] = kmeans(yourData,3);

，其中

idx是每个观察的标签（在这种情况下为值1到3）
CentreCoordinates是群集中心的坐标（每行是一个中心）
SEE是每个观测到其最近的聚类中心的总体内欧几里德距离 - SEE。

由于您实际上不需要索引，因此可以忽略函数的第一个输出~（代字号）：

[~,CentreCoordinates,SEE] = kmeans(yourData,3);

Answer 2

此代码使用内置MATLAB函数＆＃39; k-means＆＃39;。您需要使用自己的k-means算法对其进行修改。它显示了群集中心的计算和平方误差的总和（也称为不同运动）。

clc; close all; clear all; 
data = readtable('data.txt'); % Importing the data-set
d1 = table2array(data(:, 2)); % Data in first dimension 
d2 = table2array(data(:, 3)); % Data in second dimension
d3 = table2array(data(:, 4)); % Data in third dimension 
d4 = table2array(data(:, 5)); % Data in fourth dimension 
X = [d1, d2, d3, d4]; % Combining the data into a matrix
k = 3; % Number of clusters
idx = kmeans(X, 3); % Alpplying the k-means using inbuilt funciton 
%% Separating the data in different dimension
d1_1 = d1(idx == 1); % d1 for the data in cluster 1 
d2_1 = d2(idx == 1); % d2 for the data in cluster 1
d3_1 = d3(idx == 1); % d3 for the data in cluster 1
d4_1 = d4(idx == 1); % d4 for the data in cluster 1
%==============================
d1_2 = d1(idx == 2); % d1 for the data in cluster 2 
d2_2 = d2(idx == 2); % d2 for the data in cluster 2
d3_2 = d3(idx == 2); % d3 for the data in cluster 2
d4_2 = d4(idx == 2); % d4 for the data in cluster 2
%==============================
d1_3 = d1(idx == 3); % d1 for the data in cluster 3
d2_3 = d2(idx == 3); % d2 for the data in cluster 3
d3_3 = d3(idx == 3); % d3 for the data in cluster 3
d4_3 = d4(idx == 3); % d4 for the data in cluster 3
%% Finding the co-ordinates of the cluster centroids
c1_d1 = mean(d1_1); % d1 value of the centroid for cluster 1
c1_d2 = mean(d2_1); % d2 value of the centroid for cluster 1
c1_d3 = mean(d3_1); % d2 value of the centroid for cluster 1
c1_d4 = mean(d4_1); % d2 value of the centroid for cluster 1
%====================================
c2_d1 = mean(d1_2); % d1 value of the centroid for cluster 2
c2_d2 = mean(d2_2); % d2 value of the centroid for cluster 2
c2_d3 = mean(d3_2); % d2 value of the centroid for cluster 2
c2_d4 = mean(d4_2); % d2 value of the centroid for cluster 2
%====================================
c3_d1 = mean(d1_3); % d1 value of the centroid for cluster 3
c3_d2 = mean(d2_3); % d2 value of the centroid for cluster 3
c3_d3 = mean(d3_3); % d2 value of the centroid for cluster 3
c3_d4 = mean(d4_3); % d2 value of the centroid for cluster 3
%% Calculating the distortion
distortion = 0; % Initialization
for n1 = 1 : length(d1_1)    
    distortion = distortion + ( ( ( c1_d1 - d1_1(n1) ).^2 ) + ( ( c1_d2 - d2_1(n1) ).^2 ) + ...
                                                    ( ( c1_d3 - d3_1(n1) ).^2 ) + ( ( c1_d4 - d4_1(n1) ).^2 ) );                                                 
end
for n2 = 1 : length(d1_2)    
    distortion = distortion + ( ( ( c2_d1 - d1_2(n2) ).^2 ) + ( ( c2_d2 - d2_2(n2) ).^2 ) + ...
                                                    ( ( c2_d3 - d3_2(n2) ).^2 ) + ( ( c2_d4 - d4_2(n2) ).^2 ) );                                                 
end
for n3 = 1 : length(d1_3)    
    distortion = distortion + ( ( ( c3_d1 - d1_3(n3) ).^2 ) + ( ( c3_d2 - d2_3(n3) ).^2 ) + ...
                                                    ( ( c3_d3 - d3_3(n3) ).^2 ) + ( ( c3_d4 - d4_3(n3) ).^2 ) );                                                 
end
fprintf('The unnormalized sum of square error is %f\n', distortion);
fprintf('The co-ordinate of the cluster 1 is \t d1 = %f, d2 = %f, d3 = %f, d4 = %f\n', c1_d1, c1_d2, c1_d3, c1_d4);
fprintf('The co-ordinate of the cluster 2 is \t d1 = %f, d2 = %f, d3 = %f, d4 = %f\n', c2_d1, c2_d2, c2_d3, c2_d4);
fprintf('The co-ordinate of the cluster 3 is \t d1 = %f, d2 = %f, d3 = %f, d4 = %f\n', c3_d1, c3_d2, c3_d3, c3_d4);

如何计算k-mean聚类matlab中的平方误差总和？

2 个答案: