Question

我正在尝试获取最重复的值（以及重复次数的百分比）。这是一个例子：

A = [5 5 1 2 3 4 6 6 7 7 7 8 8 8 8];

mostduplicatevalue(A)应返回8，百分比为4 /长度（A）。

我目前正在执行以下操作（请参见下文），但获得1300 * 5000矩阵的结果大约需要5/6秒。有什么更好的方法来实现这个结果？

function [MostDuplicateValue, MostDuplicatePerc] = mostduplicatevalue(A)
% What is the value that is duplicates the most and what percentage of the
% sample does it represent?

% Value that is Most Duplicated
tbl     = tabulate(A(:));
[~,bi]  = max(tbl(:,2));

MostDuplicateValue = tbl(bi,1);
MostDuplicatePerc  = tbl(bi,3)/100;

end

Answer 1

这是一个可能的答案：

function [MostDuplValue, MostDuplPerc, MostDuplCount] = mostduplicatevalue(A)
% What is the value that is duplicates the most and what percentage of the
% sample does it represent?

[MostDuplValue,MostDuplCount] = mode(A(:));
MostDuplPerc = MostDuplCount / sum(sum(~isnan(A)));

end

Answer 2

Solution based on first sorting the array (very costly operation) and then finding the longest streak of the same number with diff. Empirically it seems to be slightly faster (takes about 2/3 of the duration of your proposal at 1300x5000). Has the side benefit that if multiple numbers occur the most, it will return all of them.

% sort array and pad it with -inf and inf
B = [-inf; sort(A(:)); inf];
% find indexes where the streak of each number begins
C = find(diff(B));
% count the length of the streaks
D = diff(C);
% extract the numbers with the longest streak
MostDuplValue = B(C(logical([0; D==max(D)])));
% calc percentage of most occuring value
MostDuplPerc = max(D)/numel(A);

获得Matrix中重复次数最多的值的有效方法

2 个答案: