Question

我有一个矩阵（A）的形式（实际上要大得多）：

205   204   201
202   208   202

如何逐列统计数字的共同发生率，然后将其输出到矩阵？

我希望最终矩阵从最小（A）：最大（A）（或能够指定一个特定范围）跨越顶部和侧面运行，并且它可以计算共同的每列中的数字。使用上面的例子：

    200 201 202 203 204 205 206 207 208
200  0   0   0   0   0   0   0   0   0
201  0   0   1   0   0   0   0   0   0
202  0   0   0   0   0   1   0   0   0 
203  0   0   0   0   0   0   0   0   0
204  0   0   0   0   0   0   0   0   1
205  0   0   0   0   0   0   0   0   0
206  0   0   0   0   0   0   0   0   0
207  0   0   0   0   0   0   0   0   0
208  0   0   0   0   0   0   0   0   0

（不需要矩阵标签）

两个要点：计数需要不重复并按数字顺序排列。例如，包含以下内容的列：

205
202

将此计算为202发生在205（如上面的矩阵所示）但不是205与202 - 重复的倒数。在决定使用哪个数字作为参考时，它应该是最小的。

编辑：

enter image description here

Answer 1

sparse救援！

将您的数据和所需范围定义为

A = [ 205   204   201
      202   208   202 ]; %// data. Two-row matrix
limits = [200 208]; %// desired range. It needn't include all values of A

然后

lim1 = limits(1)-1;
s = limits(2)-lim1;
cols = all((A>=limits(1)) & (A<=limits(2)), 1);
B = sort(A(:,cols), 1, 'descend')-lim1;
R = full(sparse(B(2,:), B(1,:), 1, s, s));

给出

R =
     0     0     0     0     0     0     0     0     0
     0     0     1     0     0     0     0     0     0
     0     0     0     0     0     1     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     1
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0

或者，您可以省略sort并使用矩阵添加，然后使用triu来获得相同的结果（可能更快）：

lim1 = limits(1)-1;
s = limits(2)-lim1;
cols = all( (A>=limits(1)) & (A<=limits(2)) , 1);
R = full(sparse(A(2,cols)-lim1, A(1,cols)-lim1, 1, s, s));
R = triu(R + R.');

两种方法都处理重复的列（直到排序），正确地增加它们的计数。例如，

A = [205   204   201
     201   208   205]

给出

R =
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     2     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     1
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0

Answer 2

看看这是不是你想要的 -

range1 = 200:208 %// Set the range

A = A(:,all(A>=min(range1)) & all(A<=max(range1))) %// select A with columns
                                                   %// that fall within range1
A_off = A-range1(1)+1 %// Get the offsetted indices from A

A_off_sort = sort(A_off,1) %// sort offset indices to satisfy "smallest" criteria

out = zeros(numel(range1)); %// storage for output matrix
idx = sub2ind(size(out),A_off_sort(1,:),A_off_sort(2,:)) %// get the indices to be set

unqidx = unique(idx)
out(unqidx) = histc(idx,unqidx) %// set coincidences

用

A = [205   204   201
     201   208   205]

这得到 -

out =
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     2     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     1
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0

这里可以使用很少以性能为导向的技巧 -

予。替换

out = zeros(numel(range1));

带

out(numel(range1),numel(range1)) = 0;

II。替换

idx = sub2ind(size(out),A_off_sort(1,:),A_off_sort(2,:))

与

idx = (A_off_sort(2,:)-1)*numel(range1)+A_off_sort(1,:)

Answer 3

使用accumarray的解决方案怎么样？我首先将每个列独立排序，然后将第一行作为第一维进入最终累积矩阵，然后将第二行作为第二维进入最终累积矩阵。类似的东西：

limits = 200:208;
A = A(:,all(A>=min(limits)) & all(A<=max(limits))); %// Borrowed from Divakar

%// Sort the columns individually and bring down to 1-indexing
B = sort(A, 1) - limits(1) + 1;

%// Create co-occurrence matrix
C = accumarray(B.', 1, [numel(limits) numel(limits)]);

使用：

A = [205   204   201
     202   208   202]

这是输出：

C =

     0     0     0     0     0     0     0     0     0
     0     0     1     0     0     0     0     0     0
     0     0     0     0     0     1     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     1
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0

有重复项（从Luis Mendo借来）：

A = [205   204   201
     201   208   205]

输出：

C =

     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     2     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     1
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0

计算矩阵列中数字的共同发生率 - MATLAB

3 个答案: