我的部分数据(字符串的单元格数组)如下所示。我想计算特定字符串的出现次数(例如'P0702'
,'P0882'
等),并以下面显示的输出形式显示出现的总和:
'1FA' '2012' 'F' '' '' '' '' '' 'P0702' 'P0882'
'1Fc' '2012' 'r' '' '' '' '' '' 'P0702' '' '' ''
'1FA' '2012' 'f' '' '' '' '' '' 'P0702' 'P0882' ''
'1FA' '2012' 'y' '' '' '' 'P0702' '' '' '' '' ''
'1FA' '2012' 'g' '' '' '' '' '' '' '' '' '' ''
'1FA' '2012' 'u' '' 'P0702' 'P0882' '' '' '' '' ''
'1FA' '2012' 'y' '' 'P0702' '' '' '' '' '' '' ''
'1FA' '2012' 'n' '' 'P0702' '' '' '' '' '' '' ''
'1FA' '2012' 'j' '' '' '' '' '' '' '' '' 'P0702'
'1FA' '2012' 'u' 'P0702' '' '' '' '' '' '' '' ''
'1FM' '2013' 'x' '' '' '' '' '' 'P1921' '' '' ''
'1FM' '2013' 'c' '' 'P1711' '' '' '' '' '' '' ''
'1FM' '2013' 'c' '' '' '' '' '' 'P0702' 'P0882' ''
'1FM' '2009' 'E' '' '' '' '' '' '' '' 'P0500'
输出:
sum of counts above
P0702 15
P0500 1
P1711 1
等等。
我尝试使用sum(strcmp(d,{'P0882'}),2);
告诉我'P0882'
出现了多少次,但是很难将它用于每个数据字符串。
答案 0 :(得分:2)
您可以执行以下操作,基本上按照您的建议应用strcmp
,但是在预先确定要计算的唯一字符串/数据名称的循环中。
我修改了你提供的数据,使尺寸合适。代码被评论并且非常容易理解:
C = {'1FA' '2012' 'F' '' '' '' '' '' 'P0702' 'P0882' ;
'1Fc' '2012' 'r' '' '' '' '' '' 'P0702' '';
'1FA' '2012' 'f' '' '' '' '' '' 'P0702' 'P0882';
'1FA' '2012' 'y' '' '' '' 'P0702' '' '' '';
'1FA' '2012' 'g' '' '' '' '' '' '' '';
'1FA' '2012' 'u' '' 'P0702' 'P0882' '' '' '' '' ;
'1FA' '2012' 'y' '' 'P0702' '' '' '' '' '' ;
'1FA' '2012' 'n' '' 'P0702' '' '' '' '' '' ;
'1FA' '2012' 'j' '' '' '' '' '' '' 'P0702' ;
'1FA' '2012' 'u' 'P0702' '' '' '' '' '' '' ;
'1FM' '2013' 'x' '' '' '' '' '' 'P1921' '';
'1FM' '2013' 'c' '' 'P1711' '' '' '' '' '';
'1FM' '2013' 'c' '' '' '' '' '' 'P0702' 'P0882';
'1FM' '2009' 'E' '' '' '' '' '' '' 'P0500'}
%// Find unique strings to count occurence of.
[strings,~,~] = unique(C(:,4:end));
%// Remove empty cells automatically.
strings = strings(~cellfun(@isempty,strings));
%// Initialize output cell array
Output = cell(numel(strings),2);
%// Count occurence. You can combine the 2 lines into one using concatenation.
for k = 1:numel(strings)
Output{k,1} = strings{k};
Output{k,2} = sum(sum(strcmp(C(:,4:end),strings{k})));
end
让我们做一个很好的表格:
T = table(Output(:,2),'RowNames',Output(:,1),'VariableNames',{'TotalOccurences'})
输出:
T =
TotalOccurences
_______________
P0500 [ 1]
P0702 [10]
P0882 [ 4]
P1711 [ 1]
P1921 [ 1]
如果您无法访问table
函数,则可以创建带标题的单元格数组并更改循环:
%// Initialize output cell array
Output = cell(numel(strings)+1,2);
%// Count occurence
for k = 1:numel(strings)
Output{k+1,1} = strings{k};
Output{k+1,2} = sum(sum(strcmp(C(:,4:end),strings{k})));
end
%T = table(Output(:,2),'RowNames',Output(:,1),'VariableNames',{'TotalOccurences'})
Output(1,:) = {'Data' 'Occurence'}
输出:
Output =
'Data' 'Occurence'
'P0500' [ 1]
'P0702' [ 10]
'P0882' [ 4]
'P1711' [ 1]
'P1921' [ 1]
答案 1 :(得分:2)
如果您拥有统计工具箱,则只需使用tabulate
if ( User.Identity.IsAuthenticated == true)
{
// Authenticated user...do something
}
else
{
// anonymous..do something different
}
它已经提供了格式良好的输出:
%// get only relevant part
X = data(:,4:end);
%// tabulate
tabulate(X(:))
或者使用标准功能:
Value Count Percent
P0702 10 58.82%
P1711 1 5.88%
P0882 4 23.53%
P1921 1 5.88%
P0500 1 5.88%
答案 2 :(得分:1)
您可以在没有循环的情况下计算所有字符串的出现次数。让C
成为您的单元格数组。
[uniqueStrings, ~, v] = unique(C);
counts = histc(v, 1:max(v));
result = [uniqueStrings(:) num2cell(counts(:))];
在你的例子中,这给出了
result =
'' [81]
'1FA' [ 9]
'1FM' [ 4]
'1Fc' [ 1]
'2009' [ 1]
'2012' [10]
'2013' [ 3]
'E' [ 1]
'F' [ 1]
'P0500' [ 1]
'P0702' [10]
'P0882' [ 4]
'P1711' [ 1]
'P1921' [ 1]
'c' [ 2]
'f' [ 1]
'g' [ 1]
'j' [ 1]
'n' [ 1]
'r' [ 1]
'u' [ 2]
'x' [ 1]
'y' [ 2]