Question

我想计算包含norm的大矩阵（包含数千列的数十行）的每列的NaN。在计算norm之前，每个列的平均值减去。所有NaN值都被视为0.因此我将其作为

执行

nanix = isnan(X);
nx = sum(~nanix); % count the number of non-NaN values in each column for calculating mean
X(nanix) = 0;
X = bsxfun(@minus, X, sum(X)./nx);
X(nanix) = 0;
xnorm = sqrt(sum(X.^2));

我认为这是有效的，除了将所有NaN值分配为0的两行。profile表明这两行占所有计算的50％以上。使用大小为70乘2000的数据矩阵，超过10 s花费10,000次运行分配。有什么建议吗？

===============

根据要求，测试功能可以是：

%%
function test
    a = randn(80,3000);
    [r,c] = size(a);
    b = randperm(r*c);
    nanix = b(1:round(numel(b)*0.3)); % randomly select 30% of values to be NaN
    a(nanix) = NaN;
    nnx = sum(~isnan(a));
    tic;
    for i = 1:1000
        t=a;
        t(nanix)=0;
        tm = sum(t)./nnx;
        t = bsxfun(@minus, t, tm);
        t(nanix) = 0;
        tnorm = sqrt(sum(t.^2));
    end
    tt = toc;
    fprintf('time: %.4f',tt);
end

输出

>> test
time: 3.4734

profile表示第一个t(nanix)=0;成本为42％，第二个成本占总运行时间的16.3％。

Answer 1

更新：我已经将OP的方式纳入了比较，并且发现了一些错误。

您正在寻找向量的范数，该向量是通过另一个特殊向量减去向量的结果（对应于原始向量中的~nan元素的元素的常量值）。

这里我使用平方定律（a-b）^ 2 = a ^ 2 - 2ab + b ^ 2，其中a和b是向量。这将避免一个零分配，以及奇点扩展。

另外，根据@Elkan，对于矢量 a 及其均值b，sum（（ a -b）^ 2）= sum（ a ^ 2）-2 *总和（ a ）* b + n * b ^ 2 =总和（ a ^ 2）-2 * n * b * b + n * b ^ 2 = sum（ a ^ 2）-n * b ^ 2，其中n是用于计算平均值的非零点的数量。

这两种方法的关键是避免评估中心向量 a -b，这需要第二个X(nanix) = 0;

根据Profiler的说法，最耗时的行是：

X(nanix) = 0;（〜> 30％）
X2 = sum(X.^2);（~10％）
bsxfun（令人惊讶的是~10％）

正如@Jon在评论中所提到的，X(~nanix)会将所有非纳米数字拉出作为所需的输入。但是，此操作需要一个占用大量时间的内存副本。更重要的是，由于nan的数量在所有列向量中并不一致，因此很难对该过程进行向量化（必须使用for循环来处理每个列，或者促进更慢的事情，如{{ 1}}）。

完整的测试代码：

cellfun

输出：

clear;clc;close all

a = randn(80,3000);
[r,c] = size(a);
b = randperm(r*c);
nanix = b(1:round(numel(b)*0.3)); % randomly select 30% of values to be NaN
a(nanix) = NaN;
nnx = sum(~isnan(a));
clearvars -except a
tic
for i = 1:1e3
    X = a;
    nanix = isnan(X);
    nx = sum(~nanix); % count the number of non-NaN values in each column for calculating mean
    X(nanix) = 0;
    bsxminus = sum(X)./nx;
    X = bsxfun(@minus, X, bsxminus);
    X(nanix) = 0;
    xnorm = sqrt(sum(X.^2));
end
toc
clearvars -except a xnorm
tic
for i = 1:1e3
    X = a;
    nanix = isnan(X);
    nx = sum(~nanix);
    X(nanix) = 0;
    Xsum = sum(X);
    Xmean = Xsum./nx;
    X2 = sum(X.^2);
    Xmean2 = Xmean.^2.*nx;
    XXmean = Xsum.*Xmean;
    xnorm2 = sqrt( X2+Xmean2-XXmean-XXmean ); % avoid bsx
end
toc
norm(abs(xnorm-xnorm2)./xnorm) % relative error
clearvars -except a xnorm
tic
for i = 1:1e3
    X = a;
    nanix = isnan(X);
    nx = sum(~nanix);
    X(nanix) = 0;
    Xsum = sum(X);
    X2sum = sum(X.^2); % 50% time consumed here
    xnorm3 = sqrt( X2sum - Xsum.^2./nx ); % avoid bsx
end
toc
norm(abs(xnorm-xnorm3)./xnorm) % relative error
clearvars -except a xnorm
tic
for i = 1:1e3
    X = a;
    nanix = isnan(X);
    X = X(~nanix);
    bsxminus = mean(X);
    X = bsxfun(@minus, X, bsxminus);
    xnorm4 = sqrt(sum(X.^2));
end
toc
norm(abs(xnorm-xnorm4)./xnorm) % I can't think of a working way

可以看出，前两种方法具有相似的速度，而我确实观察到第二种方法的持续时间更短。这是因为缺少计算与Elapsed time is 6.326877 seconds. Elapsed time is 3.780087 seconds. ans = 8.8214e-15 Elapsed time is 3.690037 seconds. ans = 8.8283e-15 Elapsed time is 3.632071 seconds. ans = 3.0369e+03 >>相关的事情可能会节省很多时间。

同时，第三个不能给出正确答案;问题出在索引命令mean(X)

Answer 2

嗯，这就是我的意思，但是@Yvon，你是对的，它比将NaNs分配给0要慢。另外，使用TIMEIT函数进行更好的测试（Elkan的回答似乎不对）。我确信有一些小的改进和其他方法，但我想我只是添加它。

    Test Results:

func4 %# Jon potential improves on func3
  runTime: 0.00374437,
  error:5.32907e-15
func3 %# Elkan's comment on Yvon's answer
  runTime: 0.00375083,
  error:5.32907e-15
func2 %# Expanded Norm Yvon
  runTime: 0.00386379,
  error:5.32907e-15
func1 %# Original
  runTime: 0.00515395,
  error:0
func5 %# Jon's no zero-assign version of func3
  runTime: 0.00884188,
  error:5.32907e-15
func6 %# Jon's no zero-assign version of Original (norm-mean directly)
  runTime: 0.0105508,
  error:2.66454e-15

代码：

function test()

a = randn(80,3000);
[r,c] = size(a);
b = randperm(r*c);
nanix = b(1:round(numel(b)*0.3)); % randomly select 30% of values to be NaN
a(nanix) = NaN;
% nnx = sum(~isnan(a));

f = {...
        @() func1(a);   %# Original
        @() func2(a);   %# Expanded Yvon
        @() func3(a);   %# Elkan's comment
        @() func4(a);   %# Jon potential improves on 3
        @() func5(a);   %# Jon's no zero-assign Elkan's
        @() func6(a);   %# Jon's no zero-assign on Original
    };
    %Time each function/method
    timings = cellfun(@timeit, f);
    validity = cellfun(@feval, f, 'UniformOutput',false);
    err=max( abs( bsxfun(@minus,validity{1},vertcat(validity{:})) ),[],2 );
    %Display in order of speed:
    idxTime=1:size(timings,1);
    [sortedTime,sortMap]=sort(timings);
    fprintf('Test Results:\n\n');
    fprintf('func%d\r  runTime: %g,\r  error:%g\n',[idxTime(sortMap);sortedTime.';err(sortMap).'])
end

%% Functions below
function xnorm = func1(X)

    nanix = isnan(X);
    nx = sum(~nanix); % count the number of non-NaN values in each column for calculating mean
    X(nanix) = 0;
    bsxminus = sum(X)./nx;
    X = bsxfun(@minus, X, bsxminus);
    X(nanix) = 0;
    xnorm = sqrt(sum(X.^2));

end
function xnorm2 = func2(X)


    nanix = isnan(X);
    nx = sum(~nanix);
    X(nanix) = 0;
    Xsum = sum(X);
    Xmean = Xsum./nx;
    X2 = sum(X.^2);
    Xmean2 = Xmean.^2.*nx;
    XXmean = Xsum.*Xmean;
    xnorm2 = sqrt( X2+Xmean2-XXmean-XXmean ); % avoid bsx

end
function xnorm3 = func3(X)

    nanix = isnan(X);
    nx = sum(~nanix);
    X(nanix) = 0;
    Xsum = sum(X);
    X2sum = sum(X.^2); % 50% time consumed here
    xnorm3 = sqrt( X2sum - Xsum.^2./nx ); % avoid bsx

 % relative error
end

%Try column version of setting to zero, Replace element-wise powers with multiplication, add realsqrt
function xnorm4 = func4(X)

    [r,c] = size(X);
    nanix = isnan(X);
    nx = sum(~nanix);%prob can't speedup without Mex
    Xtemp=X(:);
    Xtemp(nanix(:))=0;
    X=reshape(Xtemp,r,c); %Sometimes faster if linear?
    Xsum = sum(X);
    X2sum = sum(X.*X);
    xnorm4 = realsqrt( X2sum - (Xsum.*Xsum)./nx ); % avoid bsx

end
%Now using indexing instead of setting to zero
function xnorm5 = func5(X)

    nnanix = ~isnan(X);
    nx = sum(nnanix);
%     Xtemp=X(nnanix); %40% of time lost
    Xtemp=X(:);
    Xtemp=Xtemp(nnanix(:)); %drops by 20%, still slower than =0
    %Now have two vectors, one of values, and one of # of values per
    %original column... avoid cells!
    ind(1,numel(Xtemp)) = 0;%hack pre-allocation
    ind(cumsum(nx(end:-1:1))) = 1;
    ind = cumsum(ind(end:-1:1)); %grouping index for accumarray
    Xsum = accumarray(ind(:), Xtemp(:)); %sum each "original" columns
    X2sum = accumarray(ind(:), Xtemp(:).*Xtemp(:));%./nx(:) for mean directly

    xnorm5 = realsqrt( X2sum - (Xsum.*Xsum)./nx(:) ).'; % avoid bsx

end

%Now using indexing instead of setting to zero
function xnorm6 = func6(X)

    nnanix = ~isnan(X);
    nx = sum(nnanix); %only Mex could speed up
%     Xtemp=X(nnanix); %40% of time lost
    Xtemp=X(:);
    Xtemp=Xtemp(nnanix(:)); %drops by 20%, still slower than =0
    numNoNans=numel(Xtemp);
    %Now have two vectors, one of values, and one of # of values per
    %original column... avoid cells! It's like a "run length encoding"

    %Almost like "run-length encoding" FEX: RunLength, for MEX
    ind(1,numNoNans) = 0;%hack pre-allocation
    ind(cumsum(nx(end:-1:1))) = 1;
    ind = cumsum(ind(end:-1:1)); %// generate grouping index for accumarray
    Xmean = (accumarray(ind(:), Xtemp(:))./nx(:)).'; %means of each col
    %"Run-length decoding" %Repelem is fastest if 2015b>
    idx(1,numNoNans)=0;
    idx([1 cumsum(nx(1:end-1))+1]) = diff([0 Xmean]);
    XmeanRepeated = cumsum(idx);
    XminusMean = Xtemp(:).'-XmeanRepeated; %Each col subtracted by col mean
    XSumOfSquares = accumarray(ind(:), XminusMean.*XminusMean); %sum each "original" columns

    xnorm6 = realsqrt(XSumOfSquares).';

end

Answer 3

我很惊讶地看到我提出的方式在被@Yvon强制执行时花费了更多时间。所以我做了一些实验并作为新的答案发布。

clear;clc;close all

a = randn(80,3000);
[r,c] = size(a);
b = randperm(r*c);
nanix = b(1:round(numel(b)*0.3)); % randomly select 30% of values to be NaN
a(nanix) = NaN;
% nan values
nanix = isnan(a);
% count the number of non-NaN values in each column for calculating mean
nx = sum(~nanix);
clearvars -except a nanix nx
tic
for i = 1:1e3
    X = a;
    X(nanix) = 0;
    bsxminus = sum(X)./nx;
    X = bsxfun(@minus, X, bsxminus);
    X(nanix) = 0;
    xnorm = sqrt(sum(X.^2));
end
t = toc;
fprintf('Reference edition: time elapese: %.4f\n',t) % relative error

clearvars -except a xnorm nanix nx
tic
for i = 1:1e3
    X = a;
    X(nanix) = 0;
    Xsum = sum(X);
    Xmean = Xsum./nx;
    X2 = sum(X.^2);
    Xmean2 = Xmean.^2.*nx;
    XXmean = Xsum.*Xmean;
    xnorm2 = sqrt( X2+Xmean2-XXmean-XXmean ); % avoid bsx
end
t = toc;
fprintf('Yvon''s first edition: time elapsed: %.4fs, max error: %e\n',t, max(abs(xnorm-xnorm2))) % relative error

clearvars -except a xnorm nanix nx
tic
for i = 1:1e3
    X = a;
    X(nanix) = 0;
    Xsum = sum(X);
    X2sum = sum(X.^2); % 50% time consumed here
    xnorm3 = sqrt( X2sum - Xsum.^2./nx ); % avoid bsx
end
t = toc;
fprintf('Yvon''s second edition: time elapsed: %.4fs, max error: %e\n',t, max(abs(xnorm-xnorm3))) % relative error

clearvars -except a xnorm nanix nx
tic
for i = 1:1e3
    X = a;
    X = X(~nanix);
    bsxminus = mean(X);
    X = bsxfun(@minus, X, bsxminus);
    xnorm4 = sqrt(sum(X.^2));
end
t = toc;
fprintf('Yvon''s third edition: time elapsed: %.4fs, max error: %e\n', t, max(abs(xnorm-xnorm4)))

clearvars -except a xnorm nanix nx
tic
X = a;
X(nanix) = 0;
for i = 1:1e3
    xsum = sum(X);
    x2sum = sum(X.^2);
    xnorm5 = sqrt( x2sum - xsum.^2./nx ); % avoid bsx
end
t = toc;
fprintf('My simplified edition: time elapsed: %.4fs, max error: %e\n',t, max(abs(xnorm-xnorm5)))

clearvars -except a xnorm nanix nx
tic
nanix = find(isnan(a));
for i = 1:1e3
    X = a;
    X(nanix) = 0;
    xsum = sum(X);
    x2sum = sum(X.^2);
    xnorm6 = sqrt( x2sum - xsum.^2./nx ); % avoid bsx
end
t = toc;
fprintf('Yvon''s edition with linear indices: time elapsed: %.4fs, max error: %e\n',t, max(abs(xnorm-xnorm6)))

clearvars -except a xnorm nanix nx
tic
nanix = find(isnan(a));
for i = 1:1e3
    X = a;
    X(nanix) = 0;
    bsxminus = sum(X)./nx;
    X = bsxfun(@minus, X, bsxminus);
    X(nanix) = 0;
    xnorm7 = sqrt(sum(X.^2));
end
t = toc;
fprintf('Reference edition with linear indices: time elapsed: %.4fs, max error: %e\n',t, max(abs(xnorm-xnorm7)))

输出结果为：

Reference edition: time elapese: 6.1678
Yvon's first edition: time elapsed: 3.6075s, max efrror: 4.440892e-15
Yvon's second edition: time elapsed: 3.5760s, max error: 3.552714e-15
Yvon's third edition: time elapsed: 4.0059s, max error: 4.043873e+02
My simplified edition: time elapsed: 0.8743s, max error: 3.552714e-15
Yvon's edition with linear indices: time elapsed: 2.3531s, max error: 3.552714e-15
Reference edition with linear indices: time elapsed: 3.5527s, max error: 0.000000e+00

使用find将逻辑索引转换为线性索引可以进一步显着提高速度。由于MATLAB建议使用逻辑索引来提高性能，我想知道为什么这与我的结果相矛盾。

在MATLAB中更快地进行多矩阵值赋值

3 个答案: