我正在尝试获取最重复的值(以及重复次数的百分比)。这是一个例子:
A = [5 5 1 2 3 4 6 6 7 7 7 8 8 8 8];
mostduplicatevalue(A)
应返回8,百分比为4 /长度(A)。
我目前正在执行以下操作(请参见下文),但获得1300 * 5000矩阵的结果大约需要5/6秒。有什么更好的方法来实现这个结果?
function [MostDuplicateValue, MostDuplicatePerc] = mostduplicatevalue(A)
% What is the value that is duplicates the most and what percentage of the
% sample does it represent?
% Value that is Most Duplicated
tbl = tabulate(A(:));
[~,bi] = max(tbl(:,2));
MostDuplicateValue = tbl(bi,1);
MostDuplicatePerc = tbl(bi,3)/100;
end
答案 0 :(得分:3)
这是一个可能的答案:
function [MostDuplValue, MostDuplPerc, MostDuplCount] = mostduplicatevalue(A)
% What is the value that is duplicates the most and what percentage of the
% sample does it represent?
[MostDuplValue,MostDuplCount] = mode(A(:));
MostDuplPerc = MostDuplCount / sum(sum(~isnan(A)));
end
答案 1 :(得分:3)
Solution based on first sorting the array (very costly operation) and then finding the longest streak of the same number with diff
. Empirically it seems to be slightly faster (takes about 2/3 of the duration of your proposal at 1300x5000). Has the side benefit that if multiple numbers occur the most, it will return all of them.
% sort array and pad it with -inf and inf
B = [-inf; sort(A(:)); inf];
% find indexes where the streak of each number begins
C = find(diff(B));
% count the length of the streaks
D = diff(C);
% extract the numbers with the longest streak
MostDuplValue = B(C(logical([0; D==max(D)])));
% calc percentage of most occuring value
MostDuplPerc = max(D)/numel(A);