在matlab中查找数组中的高频元素

时间:2014-04-13 10:15:56

标签: arrays matlab find

我有一个名为reducedWords(nx1)的数组,这个数组包含我的文档中的单词。我需要找到高频词,我的问题是:有没有我可以使用的功能?或者我应该定义我的功能?

reducedWords = allWords;
unneccesaryWords = {'in','on','at','from','with','a','as','if','of',...
                    'that','and','the','or','else','to','an'};
kk = 1;
while kk <= length(reducedWords)
    for cc = 1:length(unneccesaryWords)
        if strcmp(reducedWords{kk},unneccesaryWords{cc})==1
            reducedWords = { reducedWords{1:kk-1} reducedWords{kk+1:end} };
            kk = 1;
        end
    end
    kk = kk + 1;
end

祝你好运

2 个答案:

答案 0 :(得分:1)

您可以使用tabulate(),它会在向量中创建数据频率表。

示例:

words = {'a','a','bb','bb','bb','bb','ccc'};
tab = tabulate(words)

结果:

  Value    Count   Percent
      a        2     28.57%
     bb        4     57.14%
    ccc        1     14.29%

或者,您可以使用CountMember.m

答案 1 :(得分:1)

方法1

<强>代码

words_cell_array = {'cat' 'goat' 'man' 'woman' 'child' 'man'}
[array1, ~, ind1] = unique(words_cell_array,'stable');
[~,max_ind] = max(histc(ind1, 1:numel(array1)));
max_occuring_word = words_cell_array(max_ind)

<强>输出

words_cell_array = 

    'cat'    'goat'    'man'    'woman'    'child'    'man'


max_occuring_word = 

    'man'

方法2

<强>代码

words_cell_array = {'cat' 'goat' 'man' 'woman' 'child' 'man'}
[~, ~, ind1] = unique(words_cell_array,'stable');
[~,max_ind] = max(sum(bsxfun(@eq,ind1,ind1'),1));%%//'
max_occuring_word = words_cell_array(max_ind)

方法3:如果您正在寻找关于单词格式数组的一些统计数据

<强>代码

words_cell_array = {'man' 'goat' 'man' 'woman' 'goat' 'man'};
[Words, v1, ind1] = unique(words_cell_array,'stable');
Count = histc(ind1, 1:numel(Words));
Percent = Count*100/numel(words_cell_array);

<强>输出

words_cell_array = 
    'man'    'goat'    'man'    'woman'    'goat'    'man'

Words = 
    'man'    'goat'    'woman'

Count =
     3     2     1

Percent =
   50.0000   33.3333   16.6667