我有一个频率计数字典,我希望能够在我的dictonary中读取给定单词的频率计数。
例如,我的输入词是'about',所以输出将是我字典中'about'的计数,其中139可以计算出概率。
139 about
133 according
163 accusing
244 actually
567 afternoon
175 again
156 ah
167 a-ha
165 ahh
我尝试用fopen方法做这个,但没有得到想要的结果。
1 fid = fopen('dictionary.txt');
2 words = textscan(fid, '%s');
3 fclose(fid);
4 words = words{1};
我也尝试了这个,但得到了不同的结果,
countfunction = @(word) nnz(strcmp(word, words));
count = cellfun(countfunction, words);
tally = [words num2cell(count)];
sortrows(tally, 2);
答案 0 :(得分:0)
问题在于,您正在为字典中每个单词的每个实例运行countfunction,而不是字典中的每个唯一单词。
以下是如何逐步改进代码:
words = {'hi' 'hi' 'the' 'hi' 'the' 'a'};
unique_words = unique(words(:));
countfunction = @(word) nnz(strcmp(word, words));
count = cellfun(countfunction, unique_words);
tally = [unique_words, num2cell(count)];
disp(sortrows(tally, 2));
'a' [1]
'the' [2]
'hi' [3]
但是,我建议改用grpstats:
words = {'hi' 'hi' 'the' 'hi' 'the' 'a'};
[unique_words, count] = grpstats(ones(size(words)), words(:), {'gname', 'numel'});
tally = [unique_words, num2cell(count)];
disp(sortrows(tally, 2));
'a' [1]
'the' [2]
'hi' [3]