我有一个名为reducedWords
(nx1)的数组,这个数组包含我的文档中的单词。我需要找到高频词,我的问题是:有没有我可以使用的功能?或者我应该定义我的功能?
reducedWords = allWords;
unneccesaryWords = {'in','on','at','from','with','a','as','if','of',...
'that','and','the','or','else','to','an'};
kk = 1;
while kk <= length(reducedWords)
for cc = 1:length(unneccesaryWords)
if strcmp(reducedWords{kk},unneccesaryWords{cc})==1
reducedWords = { reducedWords{1:kk-1} reducedWords{kk+1:end} };
kk = 1;
end
end
kk = kk + 1;
end
祝你好运
答案 0 :(得分:1)
您可以使用tabulate()
,它会在向量中创建数据频率表。
示例:
words = {'a','a','bb','bb','bb','bb','ccc'};
tab = tabulate(words)
结果:
Value Count Percent
a 2 28.57%
bb 4 57.14%
ccc 1 14.29%
或者,您可以使用CountMember.m
。
答案 1 :(得分:1)
方法1
<强>代码强>
words_cell_array = {'cat' 'goat' 'man' 'woman' 'child' 'man'}
[array1, ~, ind1] = unique(words_cell_array,'stable');
[~,max_ind] = max(histc(ind1, 1:numel(array1)));
max_occuring_word = words_cell_array(max_ind)
<强>输出强>
words_cell_array =
'cat' 'goat' 'man' 'woman' 'child' 'man'
max_occuring_word =
'man'
方法2
<强>代码强>
words_cell_array = {'cat' 'goat' 'man' 'woman' 'child' 'man'}
[~, ~, ind1] = unique(words_cell_array,'stable');
[~,max_ind] = max(sum(bsxfun(@eq,ind1,ind1'),1));%%//'
max_occuring_word = words_cell_array(max_ind)
方法3:如果您正在寻找关于单词格式数组的一些统计数据
<强>代码强>
words_cell_array = {'man' 'goat' 'man' 'woman' 'goat' 'man'};
[Words, v1, ind1] = unique(words_cell_array,'stable');
Count = histc(ind1, 1:numel(Words));
Percent = Count*100/numel(words_cell_array);
<强>输出强>
words_cell_array =
'man' 'goat' 'man' 'woman' 'goat' 'man'
Words =
'man' 'goat' 'woman'
Count =
3 2 1
Percent =
50.0000 33.3333 16.6667