Question

默认情况下，MATLAB的sort函数通过保留元素的顺序来处理关系/重复元素，即

>> [srt,idx] = sort([1 0 1])

srt =

    0     1     1


idx =

    2     1     3

请注意，输入中值为1的两个元素分别被赋予索引2和3。然而，idx = [3 1 2]将是一种同样有效的排序。

我想函数[srt，all_idx] = sort_ties（in）显式返回与排序输出一致的idx的所有可能值。当然，这只会在关系或重复元素的情况下发生，而all_idx将是维度nPossibleSorts x length（in）。

我开始使用递归算法来做这件事，但很快意识到事情已经失控，有人必须先解决这个问题！有什么建议吗？

Answer 1

我和what R. M. suggested有类似的想法。然而，该解决方案被推广以处理输入向量中的任何数量的重复元素。代码首先对输入进行排序（使用函数SORT），然后遍历每个唯一值以生成该值的索引的所有排列（使用函数PERMS），将结果存储在细胞阵列。然后，通过使用函数KRON和REPMAT适当地复制它们，将每个单独值的这些索引排列组合成排序索引的排列总数：

function [srt,all_idx] = sort_ties(in,varargin)

  [srt,idx] = sort(in,varargin{:});
  uniqueValues = srt(logical([1 diff(srt)]));
  nValues = numel(uniqueValues);
  if nValues == numel(srt)
    all_idx = idx;
    return
  end

  permCell = cell(1,nValues);
  for iValue = 1:nValues
    valueIndex = idx(srt == uniqueValues(iValue));
    if numel(valueIndex) == 1
      permCell{iValue} = valueIndex;
    else
      permCell{iValue} = perms(valueIndex);
    end
  end
  nPerms = cellfun('size',permCell,1);

  for iValue = 1:nValues
    N = prod(nPerms(1:iValue-1));
    M = prod(nPerms(iValue+1:end));
    permCell{iValue} = repmat(kron(permCell{iValue},ones(N,1)),M,1);
  end
  all_idx = [permCell{:}];

end

以下是一些示例结果：

>> [srt,all_idx] = sort_ties([0 2 1 2 2 1])

srt =

     0     1     1     2     2     2

all_idx =

     1     6     3     5     4     2
     1     3     6     5     4     2
     1     6     3     5     2     4
     1     3     6     5     2     4
     1     6     3     4     5     2
     1     3     6     4     5     2
     1     6     3     4     2     5
     1     3     6     4     2     5
     1     6     3     2     4     5
     1     3     6     2     4     5
     1     6     3     2     5     4
     1     3     6     2     5     4

Answer 2

这是一个我认为正确的可能解决方案，但由于它最初产生的重复，它的效率有些低。否则它非常整洁，但我仍然怀疑它可以做得更好。

function [srt,idx] = tie_sort(in,order)

L = length(in);
[srt,idx] = sort(in,order);

for j = 1:L-1 % for each position in sorted array, look for repeats following it

  for k = j+1:L

    % if repeat found, add possible permutations to the list of possible sorts
    if srt(j) == srt(k)

       swapped = 1:L; swapped(j) = k; swapped(k) = j;
       add_idx = idx(:,swapped);

       idx = cat(1,idx,add_idx);
       idx = unique(idx,'rows'); % remove identical copies

    else % because already sorted, know don't have to keep looking

       break;

    end

  end

end

Answer 3

考虑示例A=[1,2,3,2,5,6,2]。您希望找到2出现的索引，并获取这些索引的所有可能排列。

对于第一步，将unique与histc结合使用，以查找重复元素及其出现的索引。

uniqA=unique(A);
B=histc(A,uniqA);

你得到B=[1 3 1 1 1]。现在您知道uniqA中的哪个值重复了多少次。要获得指数，

repeatIndices=find(A==uniqA(B==max(B)));

指数为[2, 4, 7]。最后，对于这些索引的所有可能排列，请使用perms函数。

perms(repeatIndices)
ans =

 7     4     2
 7     2     4
 4     7     2
 4     2     7
 2     4     7
 2     7     4

我相信这可以做你想要的。你可以围绕这一切编写一个包装器函数，这样就可以得到像out=sort_ties(in)这样紧凑的东西。你可能应该在repeatIndices行附近包含一个条件，这样如果B全部为1，你就不再继续（即，没有联系）。

使用显式tie（重复元素）解析进行排序

3 个答案: