我正在为给定的4维数据实现k-means算法,其中k =#of cluster,我使用不同的初始点运行大约5次。
python manage.py scrape

如果有人能帮助我,我会非常高兴。 感谢
答案 0 :(得分:1)
kmeans()
函数已经直接提供了您想要的所有内容。它具有3个集群的以下语法:
[idx,CentreCoordinates,SEE] = kmeans(yourData,3);
,其中
idx
是每个观察的标签(在这种情况下为值1到3)CentreCoordinates
是群集中心的坐标(每行是一个中心)SEE
是每个观测到其最近的聚类中心的总体内欧几里德距离 - SEE。由于您实际上不需要索引,因此可以忽略函数的第一个输出~
(代字号):
[~,CentreCoordinates,SEE] = kmeans(yourData,3);
答案 1 :(得分:1)
此代码使用内置MATLAB函数' k-means'。您需要使用自己的k-means算法对其进行修改。它显示了群集中心的计算和平方误差的总和(也称为不同运动)。
clc; close all; clear all;
data = readtable('data.txt'); % Importing the data-set
d1 = table2array(data(:, 2)); % Data in first dimension
d2 = table2array(data(:, 3)); % Data in second dimension
d3 = table2array(data(:, 4)); % Data in third dimension
d4 = table2array(data(:, 5)); % Data in fourth dimension
X = [d1, d2, d3, d4]; % Combining the data into a matrix
k = 3; % Number of clusters
idx = kmeans(X, 3); % Alpplying the k-means using inbuilt funciton
%% Separating the data in different dimension
d1_1 = d1(idx == 1); % d1 for the data in cluster 1
d2_1 = d2(idx == 1); % d2 for the data in cluster 1
d3_1 = d3(idx == 1); % d3 for the data in cluster 1
d4_1 = d4(idx == 1); % d4 for the data in cluster 1
%==============================
d1_2 = d1(idx == 2); % d1 for the data in cluster 2
d2_2 = d2(idx == 2); % d2 for the data in cluster 2
d3_2 = d3(idx == 2); % d3 for the data in cluster 2
d4_2 = d4(idx == 2); % d4 for the data in cluster 2
%==============================
d1_3 = d1(idx == 3); % d1 for the data in cluster 3
d2_3 = d2(idx == 3); % d2 for the data in cluster 3
d3_3 = d3(idx == 3); % d3 for the data in cluster 3
d4_3 = d4(idx == 3); % d4 for the data in cluster 3
%% Finding the co-ordinates of the cluster centroids
c1_d1 = mean(d1_1); % d1 value of the centroid for cluster 1
c1_d2 = mean(d2_1); % d2 value of the centroid for cluster 1
c1_d3 = mean(d3_1); % d2 value of the centroid for cluster 1
c1_d4 = mean(d4_1); % d2 value of the centroid for cluster 1
%====================================
c2_d1 = mean(d1_2); % d1 value of the centroid for cluster 2
c2_d2 = mean(d2_2); % d2 value of the centroid for cluster 2
c2_d3 = mean(d3_2); % d2 value of the centroid for cluster 2
c2_d4 = mean(d4_2); % d2 value of the centroid for cluster 2
%====================================
c3_d1 = mean(d1_3); % d1 value of the centroid for cluster 3
c3_d2 = mean(d2_3); % d2 value of the centroid for cluster 3
c3_d3 = mean(d3_3); % d2 value of the centroid for cluster 3
c3_d4 = mean(d4_3); % d2 value of the centroid for cluster 3
%% Calculating the distortion
distortion = 0; % Initialization
for n1 = 1 : length(d1_1)
distortion = distortion + ( ( ( c1_d1 - d1_1(n1) ).^2 ) + ( ( c1_d2 - d2_1(n1) ).^2 ) + ...
( ( c1_d3 - d3_1(n1) ).^2 ) + ( ( c1_d4 - d4_1(n1) ).^2 ) );
end
for n2 = 1 : length(d1_2)
distortion = distortion + ( ( ( c2_d1 - d1_2(n2) ).^2 ) + ( ( c2_d2 - d2_2(n2) ).^2 ) + ...
( ( c2_d3 - d3_2(n2) ).^2 ) + ( ( c2_d4 - d4_2(n2) ).^2 ) );
end
for n3 = 1 : length(d1_3)
distortion = distortion + ( ( ( c3_d1 - d1_3(n3) ).^2 ) + ( ( c3_d2 - d2_3(n3) ).^2 ) + ...
( ( c3_d3 - d3_3(n3) ).^2 ) + ( ( c3_d4 - d4_3(n3) ).^2 ) );
end
fprintf('The unnormalized sum of square error is %f\n', distortion);
fprintf('The co-ordinate of the cluster 1 is \t d1 = %f, d2 = %f, d3 = %f, d4 = %f\n', c1_d1, c1_d2, c1_d3, c1_d4);
fprintf('The co-ordinate of the cluster 2 is \t d1 = %f, d2 = %f, d3 = %f, d4 = %f\n', c2_d1, c2_d2, c2_d3, c2_d4);
fprintf('The co-ordinate of the cluster 3 is \t d1 = %f, d2 = %f, d3 = %f, d4 = %f\n', c3_d1, c3_d2, c3_d3, c3_d4);