我有[句子*单词]矩阵,如下所示
out = 0 1 1 0 1
1 1 0 0 1
1 0 1 1 0
0 0 0 1 0
我想以一种应该告诉W1
&的方式来处理这个矩阵。 W2
在"句号2"和#34;句号4"以相同的值出现,即1 1
和0 0
。输出应如下:
output{1,2}= 2 4
输出{1,2}告诉单词编号1和2出现在句子编号2和4中,具有相同的值。
比较W1
& W2
下一位候选人应为W1
& W3
在sentence 3
& sentence 4
output{1,3}= 3 4
依此类推,直到每个nth
字与其他所有单词进行比较并保存。
答案 0 :(得分:2)
这将是一个vectorized
方法 -
%// Get number of columns in input array for later usage
N = size(out,2);
%// Get indices for pairwise combinations between columns of input array
[idx2,idx1] = find(bsxfun(@gt,[1:N]',[1:N])); %//'
%// Get indices for matches between out1 and out2. The row indices would
%// represent the occurance values for the final output and columns for the
%// indices of the final output.
[R,C] = find(out(:,idx1) == out(:,idx2))
%// Form cells off each unique C (these will be final output values)
output_vals = accumarray(C(:),R(:),[],@(x) {x})
%// Setup output cell array
output = cell(N,N)
%// Indices for places in output cell array where occurance values are to be put
all_idx = sub2ind(size(output),idx1,idx2)
%// Finally store the output values at appropriate indices
output(all_idx(1:max(C))) = output_vals
答案 1 :(得分:1)
您可以使用bsxfun轻松获得大小#words-by-#words-by-#句子的逻辑矩阵:
coc = bsxfun( @eq, permute( out, [3 2 1]), permute( out, [2 3 1] ) );
如果单词occ( wi, wj, si )
和单词wi
出现在具有相同值的句子wj
中,则此逻辑数组为si
。
从output
获取coc
单元格数组
nw = size( out, 2 ); %// number of words
output = cell(nw,nw);
for wi = 1:(nw-1)
for wj = (wi+1):nw
output{wi,wj} = find( coc(wi,wj,:) );
output{wj,wi} = output{wi,wj}; %// you can force it to be symmetric if you want
end
end