Question

我有一个名为reducedWords（nx1）的数组，这个数组包含我的文档中的单词。我需要找到高频词，我的问题是：有没有我可以使用的功能？或者我应该定义我的功能？

reducedWords = allWords;
unneccesaryWords = {'in','on','at','from','with','a','as','if','of',...
                    'that','and','the','or','else','to','an'};
kk = 1;
while kk <= length(reducedWords)
    for cc = 1:length(unneccesaryWords)
        if strcmp(reducedWords{kk},unneccesaryWords{cc})==1
            reducedWords = { reducedWords{1:kk-1} reducedWords{kk+1:end} };
            kk = 1;
        end
    end
    kk = kk + 1;
end

祝你好运

Answer 1

您可以使用tabulate()，它会在向量中创建数据频率表。

示例：

words = {'a','a','bb','bb','bb','bb','ccc'};
tab = tabulate(words)

结果：

  Value    Count   Percent
      a        2     28.57%
     bb        4     57.14%
    ccc        1     14.29%

或者，您可以使用CountMember.m。

Answer 2

方法1

<强>代码

words_cell_array = {'cat' 'goat' 'man' 'woman' 'child' 'man'}
[array1, ~, ind1] = unique(words_cell_array,'stable');
[~,max_ind] = max(histc(ind1, 1:numel(array1)));
max_occuring_word = words_cell_array(max_ind)

<强>输出

words_cell_array = 

    'cat'    'goat'    'man'    'woman'    'child'    'man'


max_occuring_word = 

    'man'

方法2

<强>代码

words_cell_array = {'cat' 'goat' 'man' 'woman' 'child' 'man'}
[~, ~, ind1] = unique(words_cell_array,'stable');
[~,max_ind] = max(sum(bsxfun(@eq,ind1,ind1'),1));%%//'
max_occuring_word = words_cell_array(max_ind)

方法3：如果您正在寻找关于单词格式数组的一些统计数据

<强>代码

words_cell_array = {'man' 'goat' 'man' 'woman' 'goat' 'man'};
[Words, v1, ind1] = unique(words_cell_array,'stable');
Count = histc(ind1, 1:numel(Words));
Percent = Count*100/numel(words_cell_array);

<强>输出

words_cell_array = 
    'man'    'goat'    'man'    'woman'    'goat'    'man'

Words = 
    'man'    'goat'    'woman'

Count =
     3     2     1

Percent =
   50.0000   33.3333   16.6667

在matlab中查找数组中的高频元素

2 个答案: