Question

我有像

这样的单元格数组

 A = {'hello'; 2; 3; 4; 'hello';2;'hello'}

我想知道这个数组中是否有重复，并确定它们的名称和索引。在这个例子中，我希望有类似的东西：

names = {'hello';2};
indexes = [1, 5, 7;
         2, 6, 0];

我把第二行索引的最后一个元素放到0只是为了没有维度问题...我的问题是单元格数组既是char又是double ...我不知道怎么做处理这个......

Answer 1

因为您使用的结构包含字符串和数字，所以事情并不那么容易。假设你根本无法改变它，找到唯一值及其索引的最佳方法是循环遍历指定的单元格数组，并将其内容保存到一个地图对象，该对象将存储这些唯一条目存在的索引。

使用MATLAB的地图结构非常简单，可以按照下面的代码进行操作。

A = {'hello'; 2; 3; 4; 'hello';2;'hello'}

cellMap = containers.Map();
for i = 1 : numel(A)
    mapKey = num2str(A{i});
    if cellMap.isKey(mapKey)
       tempCell = cellMap(mapKey);
       tempCell{numel(tempCell)+1} = i;
       cellMap(mapKey) = tempCell;
    else
        tempCell = cell(1);
        tempCell{1} = i;
        cellMap(mapKey) = tempCell;
    end
end

通过输入cellMap.keys，您可以找到所有唯一值，这将返回

ans = 
    '2'    '3'    '4'    'hello'

然后，您可以使用这些键，使用cellMap('hello')找出它们在原始数组中的位置。

ans = 
    [1]    [5]    [7]

完成所有这些操作后，您可以进行一些转换以恢复原始状态，并将其更多地转换为您想要的格式。

uniqueVals = cellMap.keys;
uniqueIndices = cell(1,numel(uniqueVals));
for i = 1:numel(uniqueVals)
    uniqueIndices{i} = cell2mat(cellMap(uniqueVals{i}));
      numEquiv = str2double(uniqueVals{i});
      if ~isnan(numEquiv)
          uniqueVals{i} = numEquiv;
      end
end
uniqueVals{4}
uniqueIndices{4}

将返回：

ans =
    hello
ans = 
    1     5     7

另一种选择，可能更简单直接，就是制作单元格数组的副本，并将其所有内容转换为字符串格式。这不会立即以您想要的格式返回内容，但它是一个开始

B = cell(size(A));
for i = 1:numel(A)
    B{i} = num2str(A{i});
end
[C,~,IC] = unique(B)

然后你可以使用unique的返回来查找索引，但老实说，这已经完成了我上面写的映射代码。

Answer 2

这很麻烦，但可以做到：

m = max(cellfun(@length, A));
A2 = cellfun(@(e) [double(e) inf(1,m-length(e)) ischar(e)], A, 'uni' ,false);
A2 = cell2mat(A2);
[~, ~, jj] = unique(A2,'rows');
num = accumarray(jj,1,[],@numel);
[~, kk] = max(bsxfun(@eq, jj, find(num>1).'));
names = A(kk);
indices = arrayfun(@(n) find(jj==jj(kk(n))), 1:length(kk), 'uni', false);

这是如何工作的：A2只是A转换为数字矩阵。每行代表A的一个条目，最后一列用作区分原始数字和原始字符串的标志，inf用作填充符。然后通常情况下unique和accumarray完成实际工作，结果从jj和num获得，并进行一些比较和索引。

Answer 3

这是一个更优雅的解决方案（更简单！）：

%// Find repeating cells
A = A(:);        %// Make sure it's a column array
ia = 1:numel(A);
tf = sortrows(bsxfun(@(m, n)cellfun(@isequal, A(m), A(n)), ia, ia(:)));
tf = tf(any(diff([zeros(size(ia)); tf]), 2) & sum(tf, 2) > 1, :);

%// Extract corresponding indices and values
indices = arrayfun(@(x){find(tf(x, :))}, 1:size(tf, 1));
names = cellfun(@(x)A(x(1)), indices);

此解决方案应适用于任何数据，而不仅仅是字符串和数字。

实施例

如果我们为此运行：

A = {'hello'; 2; 3; 4; 'hello'; 2; 'hello'; 4}

我们得到：

indices =
     [4   8]
     [2   6]
     [1   5   7]

names = 
    4
    2
    'hello'

找到重复单元格数组

3 个答案:

实施例