来自文本文件的matlab字符分布

时间:2014-08-28 19:41:18

标签: matlab probability

我正在尝试绘制文本文件的分布,但是我发现我应该包含数字0-9和_ - 也包括以下代码

 f = fopen('c:\nouns.txt');
ns = textscan(f, '%s');
fclose(f);
%// Convert everything to chars
letters_char = reshape(char(ns{:}),[],1);

%// Get the case-insensitive count of each letter 
count_lettters = sum(bsxfun(@eq,letters_char,97:122),1) + ...
    sum(bsxfun(@eq,letters_char,65:90),1)

plot(count_lettters./sum(count_lettters))
bar(count_lettters./sum(count_lettters))
set(gca, 'XTickLabel',cellstr(char(97:122)'),'XTick',1:26)

这将计算并绘制来自a-z的字母分布 我想要包含a-z和0-9以及 - 和_ 有什么建议吗?

1 个答案:

答案 0 :(得分:2)

<强>代码

f = fopen(path_to_text_file);
ns = textscan(f, '%s');
fclose(f);

%// Convert everything to chars
letters_char = reshape(char(ns{:}),[],1);

%// Get the case-insensitive count of each letter 
count_lettters = sum(bsxfun(@eq,letters_char,97:122),1) + ...
    sum(bsxfun(@eq,letters_char,65:90),1);

count_numbers = sum(bsxfun(@eq,letters_char,48:57),1)

underscore_c = sum(letters_char=='_')
hyphen_c = sum(letters_char=='-')

counts = [underscore_c hyphen_c count_numbers count_lettters]

xtickstr = ['_'; '-'; cellstr(num2str([0:9]')) ; cellstr(char(97:122)')]
bar(counts./sum(counts))
set(gca, 'XTickLabel',xtickstr,'XTick',1:numel(xtickstr))

xlabel('ASCII Characters')
ylabel('Probability Distribution')

典型文字文件的输出图

enter image description here