FCM群集数字数据和csv / excel文件

时间:2011-10-10 12:28:40

标签: excel matlab cluster-analysis data-mining

嗨我问了一个前一个问题给出了一个合理的答案,我认为我已回到正轨,Fuzzy c-means tcp dump clustering in matlab问题是下面的tcp / udp数据的预处理阶段,我想通过matlabs fcm运行聚类算法。我的问题:

1)如何将单元格中的文本数据转换为数值的最佳方法是什么?数值应该是什么?

编辑:excel中的我的数据现在看起来像这样:

enter image description here

0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal.

1 个答案:

答案 0 :(得分:2)

这是一个如何将数据读入MATLAB的示例。您需要两件事:数据本身采用逗号分隔格式,以及list of features及其类型(数字,名义)。

%# read the list of features
fid = fopen('kddcup.names','rt');
C = textscan(fid, '%s %s', 'Delimiter',':', 'HeaderLines',1);
fclose(fid);

%# determine type of features
C{2} = regexprep(C{2}, '.$','');              %# remove "." at the end
attribNom = [ismember(C{2},'symbolic');true]; %# nominal features

%# build format string used to read/parse the actual data
frmt = cell(1,numel(C{1}));
frmt( ismember(C{2},'continuous') ) = {'%f'}; %# numeric features: read as number
frmt( ismember(C{2},'symbolic') ) = {'%s'};   %# nominal features: read as string
frmt = [frmt{:}];
frmt = [frmt '%s'];                           %# add the class attribute

%# read dataset
fid = fopen('kddcup.data','rt');
C = textscan(fid, frmt, 'Delimiter',',');
fclose(fid);

%# convert nominal attributes to numeric
ind = find(attribNom);
G = cell(numel(ind),1);
for i=1:numel(ind)
    [C{ind(i)},G{i}] = grp2idx( C{ind(i)} );
end

%# all numeric dataset
M = cell2mat(C);

您还可以从统计工具箱中查看DATASET类。