Question

假设我有一个矩阵，其中有N行（已知N），每行包含相同的数字，但顺序不同。每行1-9的数字相同，并且同一行中没有重复的数字。每行将具有相同数字的不同顺序。我想检测不同行中数字的模式。 “模式”是指两个或多个数字的任何重复组合。

例如，如果我的矩阵如下：

1   2   3   8   9   7   4   5   6
1   3   2   7   8   9   4   5   6
1   2   3   5   6   7   4   9   8
1   2   3   7   8   6   4   5   9
1   2   3   4   5   6   7   8   9
1   2   3   7   8   9   4   5   6
1   3   2   4   5   6   7   8   9

一个模式是集合1-2-3（在第1、3、4、5、6行中出现5次），另一种模式是集合4-5-6（在第1,2,5,6,7行中出现5次）。

MATLAB中对此有什么要求吗？

我想过要从一个命令开始，该命令为2-9位数字生成1-9的所有可能组合。对于生成的每个组合，我将扫描所有矩阵并计算出现多少次以及出现在哪些行中。然后将出现最多的组合打印出来。听起来可行吗？

Answer 1

下面的答案基于几个假设（自从我在OP的最新澄清评论之前就开始写它）：

可扩展性：这是为与问题中大小相同的数组设计的。不应假定这是适用于任何N的解决方案。
子序列：我假设所有长度为2到N的序列都是理想的，即使它们出现在更长的序列中。
位置：我们不在乎序列出现在什么地方，只是它们出现的频率。如果需要该职位，则可以应用参考the Q&A的by rahnema1中显示的解决方案。
重复行：允许。
输出：默认情况下，仅出现两次多次的组合。

解决方案的想法如下：

初始化一个可以容纳序列的空单元格数组。
从矩阵中排一行。
将其分为长度2至N的所有组合。将所有组合存储在细胞载体中。
如果还有更多未处理的行，请转到2；否则汇总并显示结果。

此外，我们使用GetMD5工具比较不均匀的数据（在这种情况下，是长度不同的向量）。

function varargout = q51521534(isRandomM, N, ignoreOrder, minAppear)
%% Handling inputs:
if nargin < 1
  % Should we generate a random matrix, or use a hardcoded default?
  isRandomM = false;
end
if isRandomM && nargin < 2
  % Number of columns.
  N = 30;
end
if nargin < 3
  % When this flag is true, [1 2 3] is considered the same as [3 1 2] etc.
  ignoreOrder = true;
end
if nargin < 4
  % The minimal frequancy needed to be plotted in the histogram.
  minAppear = 4;
end
%% Definitions:
R = 9;
MIN_LEN = 2;
%% Setup:
if isRandomM
  M = zeros(R,N);
  for ind1 = 1:R
    M(ind1,:) = randperm(N,N);
  end
else % the example from the question:
  M = uint8([ 
      1   2   3   8   9   7   4   5   6
      1   3   2   7   8   9   4   5   6
      1   2   3   5   6   7   4   9   8
      1   2   3   7   8   6   4   5   9
      1   2   3   4   5   6   7   8   9
      1   2   3   7   8   9   4   5   6
      1   3   2   4   5   6   7   8   9]);
  [R,N] = size(M);
end
%% Populate the "row-chopping" indices:
allIdx = cell(N-MIN_LEN+1,1);
for ind1 = MIN_LEN:N
  allIdx{ind1-1} = (1:ind1) + (0:N-ind1).';
end
%% Extract sequences from every row according to the indices:
S = cell((N-1)*R,1);
if ignoreOrder
  for ind1 = 1:R
    idx = (1:N-1) + (N-1)*(ind1-1);
    S(idx) = cellfun(@(x){sort(reshape(M(ind1,x.'), size(x,2),[]).',2)}, allIdx);
  end  
else
  for ind1 = 1:R
    idx = (1:N-1) + (N-1)*(ind1-1);
    S(idx) = cellfun(@(x){reshape(M(ind1,x.'), size(x,2),[]).'}, allIdx);
  end
end
S = cellfun(@(x)num2cell(x,2), S, 'UniformOutput', false); S = vertcat(S{:});
% S now contains all sequences **appearing in the array**.
%% Analyze the output:
md5 = string(cellfun(@GetMD5, S, 'UniformOutput', false));
[~,ia,ic] = unique(md5, 'stable'); uS = S(ia);
N = histcounts(ic,'BinMethod','integers');
%% Show chart:
f = find(N >= minAppear); % ignore combinations that appear less than a threshold
figure(); hB = bar(N(f)); hB.Parent.XTickLabelRotation = 45;
hB.Parent.XTickLabel = string(cellfun(@mat2str, uS(f), 'UniformOutput', false));
%% Assign outputs:
if nargout > 0
  varargout{1} = M;
  varargout{2} = S;
  varargout{3} = ic;
end

对于问题（q51521534(false,[],false,3)）中的数组，结果为：

了解运行需要多长时间：

>> tic; q51521534(true,30,false); toc;
Elapsed time is 0.075122 seconds.

>> tic; q51521534(true,100,false); toc;
Elapsed time is 0.426620 seconds.

>> tic; q51521534(true,200,false); toc;
Elapsed time is 9.765767 seconds.

如何在矩阵的行中查找数字的重复模式

1 个答案: