Question

我有一个信息载体，比如：

Info = [10, 20, 10, 30, 500, 400, 67, 350, 20, 105, 15];

和另一个ID矢量，比如说：

Info_IDs = [1, 2, 1, 4, 2, 3, 4, 1, 3, 1, 2];

我想获得一个定义如下的矩阵：

Result =
    10    10   350   105
    20   500    15     0
   400    20     0     0
    30    67     0     0

每行显示与不同ID对应的Info值。如此简短示例所示，每行ID的值数不同。

我处理大量数据（Info是1x1000000而Info_IDs是1x25000），所以我想实现这个Result矩阵，最好没有循环。我想到的一种方法是计算每个ID的直方图并存储此信息（因此Result不包含原始信息，而是包含分箱信息）。

提前感谢大家的意见。

Answer 1

这是一个矢量化解决方案，即使在大型矩阵上也应该具有内存效率和快速工作：

%// Pad data with zero values and add matching IDs
len = histc(Info_IDs, 1:max(Info_IDs));
padlen = max(len) - len;
padval = zeros(1, sum(padlen));
padval(cumsum([1, padlen(1:end - 1)])) = 1;
Info = [Info, zeros(1, sum(padlen))];
Info_IDs = [Info_IDs, cumsum(padval) + 1];

%// Group data into rows
Result = accumarray(Info_IDs(:), Info, [], @(x){x}).';
Result = [Result{:}].';

第二步也可以按如下方式进行：

%// Group data into rows
[sorted_IDs, sorted_idx] = sort(Info_IDs);
Result = reshape(Info(sorted_idx), numel(len), []).';

实施例

%// Sample input data
Info = [10 20 10 30 500 400 67 350 20 105 15];
Info_IDs = [1 2 1 4 2 3 4 1 3 1 2];

%// Pad data with zero values and add matching IDs
len = histc(Info_IDs, 1:max(Info_IDs));
padlen = max(len) - len;
padval = zeros(1, sum(padlen));
padval(cumsum([1, padlen(1:end - 1)])) = 1;
Info = [Info, zeros(1, sum(padlen))]
Info_IDs = [Info_IDs, cumsum(padval) + 1]

%// Group data into rows
Result = accumarray(Info_IDs(:), Info, [], @(x){x}).';
Result = [Result{:}].';

结果是：

Result =
    10    10   350   105
    20   500    15     0
   400    20     0     0
    30    67     0     0

Answer 2

我不知道不使用循环，但这很快：

Result = [];
n = 4; %i.e.  number of classes
for c = 1:n 
    row = Info(Info_IDs == c);
    Result (c, 1:size(row,2)) = row;
end

如果速度确实存在问题，那么您可以预先分配为Result = zeros(4, sum(Info_IDs == mode(Info_IDs)))

Answer 3

如果你不介意在它们之间有零：

number_Ids = 4; % set as required
aux = (bsxfun(@eq,Info_IDs,(1:number_Ids).'));
sol = bsxfun(@(x,y) x.*y,Info,aux)

在您的示例中，这给出了：

10     0    10     0     0     0     0   350     0   105     0
 0    20     0     0   500     0     0     0     0     0    15
 0     0     0     0     0   400     0     0    20     0     0
 0     0     0    30     0     0    67     0     0     0     0

或者，如果你注意零而不是订单，你可以按行sort结果：

sol2 = sort(sol,2,'descend')

给出了

350   105    10    10     0     0     0     0     0     0     0
500    20    15     0     0     0     0     0     0     0     0
400    20     0     0     0     0     0     0     0     0     0
 67    30     0     0     0     0     0     0     0     0     0

编辑：可以使用与here相同的技巧保留非零条目的顺序

将值分组为行

3 个答案:

实施例