嗨我问了一个前一个问题给出了一个合理的答案,我认为我已回到正轨,Fuzzy c-means tcp dump clustering in matlab问题是下面的tcp / udp数据的预处理阶段,我想通过matlabs fcm运行聚类算法。我的问题:
1)如何将单元格中的文本数据转换为数值的最佳方法是什么?数值应该是什么?
编辑:excel中的我的数据现在看起来像这样:
0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal.
答案 0 :(得分:2)
这是一个如何将数据读入MATLAB的示例。您需要两件事:数据本身采用逗号分隔格式,以及list of features及其类型(数字,名义)。
%# read the list of features
fid = fopen('kddcup.names','rt');
C = textscan(fid, '%s %s', 'Delimiter',':', 'HeaderLines',1);
fclose(fid);
%# determine type of features
C{2} = regexprep(C{2}, '.$',''); %# remove "." at the end
attribNom = [ismember(C{2},'symbolic');true]; %# nominal features
%# build format string used to read/parse the actual data
frmt = cell(1,numel(C{1}));
frmt( ismember(C{2},'continuous') ) = {'%f'}; %# numeric features: read as number
frmt( ismember(C{2},'symbolic') ) = {'%s'}; %# nominal features: read as string
frmt = [frmt{:}];
frmt = [frmt '%s']; %# add the class attribute
%# read dataset
fid = fopen('kddcup.data','rt');
C = textscan(fid, frmt, 'Delimiter',',');
fclose(fid);
%# convert nominal attributes to numeric
ind = find(attribNom);
G = cell(numel(ind),1);
for i=1:numel(ind)
[C{ind(i)},G{i}] = grp2idx( C{ind(i)} );
end
%# all numeric dataset
M = cell2mat(C);
您还可以从统计工具箱中查看DATASET类。