我有一个矩阵(A)的形式(实际上要大得多):
205 204 201
202 208 202
如何逐列统计数字的共同发生率,然后将其输出到矩阵?
我希望最终矩阵从最小(A):最大(A)(或能够指定一个特定范围)跨越顶部和侧面运行,并且它可以计算共同的每列中的数字。使用上面的例子:
200 201 202 203 204 205 206 207 208
200 0 0 0 0 0 0 0 0 0
201 0 0 1 0 0 0 0 0 0
202 0 0 0 0 0 1 0 0 0
203 0 0 0 0 0 0 0 0 0
204 0 0 0 0 0 0 0 0 1
205 0 0 0 0 0 0 0 0 0
206 0 0 0 0 0 0 0 0 0
207 0 0 0 0 0 0 0 0 0
208 0 0 0 0 0 0 0 0 0
(不需要矩阵标签)
两个要点:计数需要不重复并按数字顺序排列。例如,包含以下内容的列:
205
202
将此计算为202发生在205(如上面的矩阵所示)但不是205与202 - 重复的倒数。在决定使用哪个数字作为参考时,它应该是最小的。
编辑:
答案 0 :(得分:4)
sparse
救援!
将您的数据和所需范围定义为
A = [ 205 204 201
202 208 202 ]; %// data. Two-row matrix
limits = [200 208]; %// desired range. It needn't include all values of A
然后
lim1 = limits(1)-1;
s = limits(2)-lim1;
cols = all((A>=limits(1)) & (A<=limits(2)), 1);
B = sort(A(:,cols), 1, 'descend')-lim1;
R = full(sparse(B(2,:), B(1,:), 1, s, s));
给出
R =
0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
或者,您可以省略sort
并使用矩阵添加,然后使用triu
来获得相同的结果(可能更快):
lim1 = limits(1)-1;
s = limits(2)-lim1;
cols = all( (A>=limits(1)) & (A<=limits(2)) , 1);
R = full(sparse(A(2,cols)-lim1, A(1,cols)-lim1, 1, s, s));
R = triu(R + R.');
两种方法都处理重复的列(直到排序),正确地增加它们的计数。例如,
A = [205 204 201
201 208 205]
给出
R =
0 0 0 0 0 0 0 0 0
0 0 0 0 0 2 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
答案 1 :(得分:3)
看看这是不是你想要的 -
range1 = 200:208 %// Set the range
A = A(:,all(A>=min(range1)) & all(A<=max(range1))) %// select A with columns
%// that fall within range1
A_off = A-range1(1)+1 %// Get the offsetted indices from A
A_off_sort = sort(A_off,1) %// sort offset indices to satisfy "smallest" criteria
out = zeros(numel(range1)); %// storage for output matrix
idx = sub2ind(size(out),A_off_sort(1,:),A_off_sort(2,:)) %// get the indices to be set
unqidx = unique(idx)
out(unqidx) = histc(idx,unqidx) %// set coincidences
用
A = [205 204 201
201 208 205]
这得到 -
out =
0 0 0 0 0 0 0 0 0
0 0 0 0 0 2 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
这里可以使用很少以性能为导向的技巧 -
予。替换
out = zeros(numel(range1));
带
out(numel(range1),numel(range1)) = 0;
II。替换
idx = sub2ind(size(out),A_off_sort(1,:),A_off_sort(2,:))
与
idx = (A_off_sort(2,:)-1)*numel(range1)+A_off_sort(1,:)
答案 2 :(得分:3)
使用accumarray
的解决方案怎么样?我首先将每个列独立排序,然后将第一行作为第一维进入最终累积矩阵,然后将第二行作为第二维进入最终累积矩阵。类似的东西:
limits = 200:208;
A = A(:,all(A>=min(limits)) & all(A<=max(limits))); %// Borrowed from Divakar
%// Sort the columns individually and bring down to 1-indexing
B = sort(A, 1) - limits(1) + 1;
%// Create co-occurrence matrix
C = accumarray(B.', 1, [numel(limits) numel(limits)]);
使用:
A = [205 204 201
202 208 202]
这是输出:
C =
0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
有重复项(从Luis Mendo借来):
A = [205 204 201
201 208 205]
输出:
C =
0 0 0 0 0 0 0 0 0
0 0 0 0 0 2 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0