计算矩阵列中数字的共同发生率 - MATLAB

时间:2014-10-31 10:53:35

标签: matlab

我有一个矩阵(A)的形式(实际上要大得多):

205   204   201
202   208   202

如何逐列统计数字的共同发生率,然后将其输出到矩阵?

我希望最终矩阵从最小(A):最大(A)(或能够指定一个特定范围)跨越顶部和侧面运行,并且它可以计算共同的每列中的数字。使用上面的例子:

    200 201 202 203 204 205 206 207 208
200  0   0   0   0   0   0   0   0   0
201  0   0   1   0   0   0   0   0   0
202  0   0   0   0   0   1   0   0   0 
203  0   0   0   0   0   0   0   0   0
204  0   0   0   0   0   0   0   0   1
205  0   0   0   0   0   0   0   0   0
206  0   0   0   0   0   0   0   0   0
207  0   0   0   0   0   0   0   0   0
208  0   0   0   0   0   0   0   0   0

(不需要矩阵标签)

两个要点:计数需要不重复并按数字顺序排列。例如,包含以下内容的列:

205
202

将此计算为202发生在205(如上面的矩阵所示)但不是205与202 - 重复的倒数。在决定使用哪个数字作为参考时,它应该是最小的。

编辑:

enter image description here

3 个答案:

答案 0 :(得分:4)

sparse救援!

将您的数据和所需范围定义为

A = [ 205   204   201
      202   208   202 ]; %// data. Two-row matrix
limits = [200 208]; %// desired range. It needn't include all values of A

然后

lim1 = limits(1)-1;
s = limits(2)-lim1;
cols = all((A>=limits(1)) & (A<=limits(2)), 1);
B = sort(A(:,cols), 1, 'descend')-lim1;
R = full(sparse(B(2,:), B(1,:), 1, s, s));

给出

R =
     0     0     0     0     0     0     0     0     0
     0     0     1     0     0     0     0     0     0
     0     0     0     0     0     1     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     1
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0

或者,您可以省略sort并使用矩阵添加,然后使用triu来获得相同的结果(可能更快):

lim1 = limits(1)-1;
s = limits(2)-lim1;
cols = all( (A>=limits(1)) & (A<=limits(2)) , 1);
R = full(sparse(A(2,cols)-lim1, A(1,cols)-lim1, 1, s, s));
R = triu(R + R.');

两种方法都处理重复的列(直到排序),正确地增加它们的计数。例如,

A = [205   204   201
     201   208   205]

给出

R =
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     2     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     1
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0

答案 1 :(得分:3)

看看这是不是你想要的 -

range1 = 200:208 %// Set the range

A = A(:,all(A>=min(range1)) & all(A<=max(range1))) %// select A with columns
                                                   %// that fall within range1
A_off = A-range1(1)+1 %// Get the offsetted indices from A

A_off_sort = sort(A_off,1) %// sort offset indices to satisfy "smallest" criteria

out = zeros(numel(range1)); %// storage for output matrix
idx = sub2ind(size(out),A_off_sort(1,:),A_off_sort(2,:)) %// get the indices to be set

unqidx = unique(idx)
out(unqidx) = histc(idx,unqidx) %// set coincidences

A = [205   204   201
     201   208   205]

这得到 -

out =
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     2     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     1
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0

这里可以使用很少以性能为导向的技巧 -

予。替换

out = zeros(numel(range1)); 

out(numel(range1),numel(range1)) = 0;

II。替换

idx = sub2ind(size(out),A_off_sort(1,:),A_off_sort(2,:))  

idx = (A_off_sort(2,:)-1)*numel(range1)+A_off_sort(1,:)

答案 2 :(得分:3)

使用accumarray的解决方案怎么样?我首先将每个列独立排序,然后将第一行作为第一维进入最终累积矩阵,然后将第二行作为第二维进入最终累积矩阵。类似的东西:

limits = 200:208;
A = A(:,all(A>=min(limits)) & all(A<=max(limits))); %// Borrowed from Divakar

%// Sort the columns individually and bring down to 1-indexing
B = sort(A, 1) - limits(1) + 1;

%// Create co-occurrence matrix
C = accumarray(B.', 1, [numel(limits) numel(limits)]);

使用:

A = [205   204   201
     202   208   202]

这是输出:

C =

     0     0     0     0     0     0     0     0     0
     0     0     1     0     0     0     0     0     0
     0     0     0     0     0     1     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     1
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0

有重复项(从Luis Mendo借来):

A = [205   204   201
     201   208   205]

输出:

C =

     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     2     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     1
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0