Question

我有以下矩阵（MxN，其中M≤N）：

0.8147    0.9134    0.2785    0.9649
0.9058    0.6324    0.5469    0.1576
0.1270    0.0975    0.9575    0.9706

我想从每一行中分别选择以下列条目（每行一个）：

idx = [ 3  1  4 ];

这意味着我们将元素保留在（1,3），（2,1）和（3,4）中，而数组的其余部分应为零。

对于上面的示例，我将获得以下输出：

     0         0    0.2785         0
0.9058         0         0         0
     0         0         0    0.9706

我目前使用循环生成该循环，当矩阵大小较大时，循环会变慢。

任何人都可以提出一种性能更高的方法吗？

Answer 1

您可以使用sub2ind函数将条目索引转换为线性索引。

使用线性索引时，matlab将矩阵视为长列向量。

org_mat=[0.8147    0.9134    0.2785    0.9649
0.9058    0.6324    0.5469    0.1576
0.1270    0.0975    0.9575    0.9706];
entries=[3,1,4];

linear_entries=sub2ind(size(org_mat),1:length(entries),entries);
new_mat=zeros(size(org_mat));
new_mat(linear_entries)=org_mat(linear_entries);

Answer 2

其他答案/评论中也有关于性能的讨论。在这种情况下，简单（结构良好）的for循环会很好地完成工作，而对性能基本上没有影响。

% For some original matrix 'm', and column indexing array 'idx':
x = zeros( size(m) ); % Initialise output of zeros
for ii = 1:numel(idx) % Loop over indices
    % Assign the value at the column index for this row
    x( ii, idx(ii) ) = m( ii, idx(ii) );     
end

此代码具有很高的可读性和快速性。为了证明“快速”是正确的，我为所有当前四种答案的方法编写了以下基准测试代码，这些方法在MATLAB R2017b上运行。这是输出图。

对于“小型”矩阵，最多2 ^ 5列和2 ^ 4行：
对于“大”矩阵，最多2 ^ 15列和2 ^ 14行（使用和不使用bsxfun解决方案的相同图，因为这会破坏缩放比例）：

第一个情节可能会引起误导。尽管结果一致（慢速性能排名是bsxfun，然后是sub2ind，然后是手动索引，然后是循环），但y轴是10 ^（-5）秒，因此，哪种方法基本上无关紧要您正在使用！

第二个图显示，对于大型矩阵，方法基本上是等效的，除了bsxfun很糟糕（此处未显示，但需要更多的内存）。

我选择更清晰的循环，它为您提供了更大的灵活性，并且您会清楚地记得从现在开始两年后它在代码中的作用。

基准代码：

function benchie() 
    K = 5;                      % Max loop variable
    T = zeros( K, 4 );          % Timing results
    for k = 1:K
        M = 2^(k-1); N = 2^k;   % size of matrix
        m = rand( M, N );       % random matrix
        idx = randperm( N, M ); % column indices

        % Define anonymous functions with no inputs for timeit, and run
        f1 = @() f_sub2ind( m, idx ); T(k,1) = timeit(f1);
        f2 = @() f_linear( m, idx );  T(k,2) = timeit(f2);
        f3 = @() f_loop( m, idx );    T(k,3) = timeit(f3);   
        f4 = @() f_bsxfun( m, idx );  T(k,4) = timeit(f4);   
    end
    % Plot results
    plot( (1:K)', T, 'linewidth', 2 );
    legend( {'sub2ind', 'linear', 'loop', 'bsxfun'} );
    xlabel( 'k, where matrix had 2^{(k-1)} rows and 2^k columns' );
    ylabel( 'function time (s)' )
end

function f_sub2ind( m, idx )
    % Using the in-built sub2ind to generate linear indices, then indexing
    lin_idx = sub2ind( size(m), 1:numel(idx), idx );
    x = zeros( size(m) );
    x( lin_idx ) = m( lin_idx );
end
function f_linear( m, idx )
    % Manually calculating linear indices, then indexing
    lin_idx = (1:numel(idx)) + (idx-1)*size(m,1);
    x = zeros( size(m) );
    x( lin_idx ) = m( lin_idx );
end
function f_loop( m, idx )
    % Directly indexing in a simple loop
    x = zeros( size(m) );
    for ii = 1:numel(idx)
        x( ii, idx(ii) ) = m( ii, idx(ii) );
    end
end
function f_bsxfun( m, idx )
    % Using bsxfun to create a logical matrix of desired elements, then masking
    % Since R2016b, can use 'x = ( (1:size(m,2)) == idx(:) ) .* m;'
    x = bsxfun(@eq, 1:size(m,2), idx(:)).*m;
end

Answer 3

TL; DR -这是我的建议：

nI = numel(idx);
sz = size(m); 
x = sparse( 1:nI, idx, m(sub2ind( size(m), 1:numel(idx), idx )), sz(1), sz(2), nI);

文章的其余部分讨论了为什么它更好地工作。

看到所需的输出矩阵主要由零组成，这实际上是乞求sparse matrices的使用！这样不仅可以提高性能（尤其是对于较大的矩阵），还应该对内存更友好。

我将向Wolfie's benchmark添加两个功能：

function x = f_sp_loop( m, idx )
  nI = numel(idx);
  sz = size(m); 
  x = spalloc( sz(1), sz(2), nI ); % Initialize a sparse array.
  for indI = 1:nI
      x( indI, idx(indI) ) = m( indI, idx(indI) ); % This generates a warning (inefficient)
  end
end

function x = f_sp_sub2ind( m, idx )
  nI = numel(idx);
  sz = size(m); 
  x = sparse( 1:nI, idx, m(sub2ind( size(m), 1:numel(idx), idx )), sz(1), sz(2), nI);
end

本质上的区别是，我们不是将输出预先分配为零数组，而是作为稀疏数组。基准测试¹的结果如下：

...这引起了一个问题- sparse方法为什么快一个数量级？

要回答这个问题，我们应该查看基准测试函数内部的实际运行时分配，可以从profiler获得。要获取更多信息，我们可以使用profile('-memory','on') make the profiler output memory consumption info。在运行了稍短的基准版本²之后，该版本仅在最高值k下运行，我们得到：

所以我们可以得出几点结论：

绝大多数运行时都花费在分配和释放内存上，这就是为什么算法看起来具有几乎相同的性能的原因。因此，如果我们减少了内存分配，我们将直接节省时间（sparse的一大优势！）。
即使sub2ind和loop的方法看起来一样，我们仍然可以在两者之间建立一个“赢家”（见下图的紫色框）-sub2ind！ sub2ind是32毫秒，而循环是41毫秒。
稀疏循环方法毫不奇怪地很慢，因为mlint警告我们：
说明

代码分析器检测到可能很慢的稀疏数组的索引模式。更改稀疏数组的非零模式的分配会导致此错误，因为这样的分配会导致相当大的开销。

建议的操作

如果可能，请按照以下说明使用sparse构建稀疏数组，并且不要使用索引分配（例如C（4）= B）来构建它们：
1. 创建单独的索引和值数组。
2. 调用sparse组装索引和值数组。
如果必须使用索引分配来构建稀疏数组，则可以通过先用spalloc预分配稀疏数组来优化性能。

如果代码仅更改已经非零的数组元素，则开销是合理的。如Adjust Code Analyzer Message Indicators and Messages中所述抑制此消息。

有关更多信息，请参阅“ Constructing Sparse Matrices”。
结合了两全其美的方法，这意味着sparse的内存节省和sub2ind的向量化，似乎是只有3ms运行时间的最佳方法！

¹制作图表的代码：

function q51605093()
    K = 15;                     % Max loop variable
    T = zeros( K, 4 );          % Timing results
    for k = 1:K
        M = 2^(k-1); N = 2^k;   % size of matrix
        m = rand( M, N );       % random matrix
        idx = randperm( N, M ); % column indices

        % Define anonymous functions with no inputs, for timeit, and run
        f = cell(4,1);
        f{1} = @() f_sub2ind( m, idx ); 
        f{2} = @() f_loop( m, idx );   
        f{3} = @() f_sp_loop( m, idx );
        f{4} = @() f_sp_sub2ind( m, idx );
        T(k,:) = cellfun(@timeit, f);

        if k == 5 % test equality during one of the runs
          R = cellfun(@feval, f, 'UniformOutput', false);
          assert(isequal(R{:}));
        end
    end
    % Plot results
    figure();
    semilogy( (1:K).', T, 'linewidth', 2 ); grid on; xticks(0:K);
    legend( {'sub2ind', 'loop', 'sp\_loop', 'sp\_sub2ind'}, 'Location', 'NorthWest' );
    xlabel( 'k, where matrix had 2^{(k-1)} rows and 2^k columns' );
    ylabel( 'function time (s)' )    
end

function x = f_sub2ind( m, idx )
    % Using the in-built sub2ind to generate linear indices, then indexing
    lin_idx = sub2ind( size(m), 1:numel(idx), idx );
    x = zeros( size(m) );
    x( lin_idx ) = m( lin_idx );
end

function x = f_loop( m, idx )
    % Directly indexing in a simple loop
    x = zeros( size(m) );
    for ii = 1:numel(idx)
        x( ii, idx(ii) ) = m( ii, idx(ii) );
    end
end

function x = f_sp_loop( m, idx )
  nI = numel(idx);
  sz = size(m); 
  x = spalloc( sz(1), sz(2), nI ); % Initialize a sparse array.
  for indI = 1:nI
      x( indI, idx(indI) ) = m( indI, idx(indI) ); % This generates a warning (inefficient)
  end
end

function x = f_sp_sub2ind( m, idx )
  nI = numel(idx);
  sz = size(m); 
  x = sparse( 1:nI, idx, m(sub2ind( size(m), 1:numel(idx), idx )), sz(1), sz(2), nI);
end

²用于分析的代码：

function q51605093_MB()
    K = 15;                 % Max loop variable
    M = 2^(K-1); N = 2^K;   % size of matrix
    m = rand( M, N );       % random matrix
    idx = randperm( N, M ); % column indices

    % Define anonymous functions with no inputs, for timeit, and run
    f = cell(4,1);
    f{1} = f_sub2ind( m, idx ); 
    f{2} = f_loop( m, idx );   
    f{3} = f_sp_loop( m, idx );
    f{4} = f_sp_sub2ind( m, idx );

%     assert(isequal(f{:}));
end

... the rest is the same as above

Answer 4

没有bsxfun没有聚会

让m为输入矩阵，idx为具有列索引的向量。您可以从idx构建一个逻辑掩码，并按元素乘以m，如下所示：

result = bsxfun(@eq, 1:size(m,2), idx(:)).*m;

Answer 5

这应该比sub2ind快：

m = [0.8147, 0.9134, 0.2785, 0.9649;
0.9058, 0.6324, 0.5469, 0.1576;
0.1270, 0.0975, 0.9575,   0.9706];

n=[3,1,4];

linear = (1:length(n)) + (n-1)*size(m,1);
new_m = zeros(size(m));
new_m(linear) = m(linear);

根据给定位置选择矩阵条目

5 个答案: