我想计算包含norm
的大矩阵(包含数千列的数十行)的每列的NaN
。在计算norm
之前,每个列的平均值减去。所有NaN
值都被视为0.因此我将其作为
nanix = isnan(X);
nx = sum(~nanix); % count the number of non-NaN values in each column for calculating mean
X(nanix) = 0;
X = bsxfun(@minus, X, sum(X)./nx);
X(nanix) = 0;
xnorm = sqrt(sum(X.^2));
我认为这是有效的,除了将所有NaN
值分配为0的两行。profile
表明这两行占所有计算的50%以上。使用大小为70乘2000的数据矩阵,超过10 s
花费10,000次运行分配。有什么建议吗?
===============
根据要求,测试功能可以是:
%%
function test
a = randn(80,3000);
[r,c] = size(a);
b = randperm(r*c);
nanix = b(1:round(numel(b)*0.3)); % randomly select 30% of values to be NaN
a(nanix) = NaN;
nnx = sum(~isnan(a));
tic;
for i = 1:1000
t=a;
t(nanix)=0;
tm = sum(t)./nnx;
t = bsxfun(@minus, t, tm);
t(nanix) = 0;
tnorm = sqrt(sum(t.^2));
end
tt = toc;
fprintf('time: %.4f',tt);
end
输出
>> test
time: 3.4734
profile
表示第一个t(nanix)=0;
成本为42%,第二个成本占总运行时间的16.3%。
答案 0 :(得分:2)
更新:我已经将OP的方式纳入了比较,并且发现了一些错误。
您正在寻找向量的范数,该向量是通过另一个特殊向量减去向量的结果(对应于原始向量中的~nan元素的元素的常量值)。
这里我使用平方定律(a-b)^ 2 = a ^ 2 - 2ab + b ^ 2,其中a和b是向量。 这将避免一个零分配,以及奇点扩展。
另外,根据@Elkan,对于矢量 a 及其均值b,sum(( a -b)^ 2)= sum( a ^ 2)-2 *总和( a )* b + n * b ^ 2 =总和( a ^ 2)-2 * n * b * b + n * b ^ 2 = sum( a ^ 2)-n * b ^ 2,其中n是用于计算平均值的非零点的数量。
这两种方法的关键是避免评估中心向量 a -b,这需要第二个X(nanix) = 0;
根据Profiler的说法,最耗时的行是:
X(nanix) = 0;
(〜> 30%)
X2 = sum(X.^2);
(~10%)
bsxfun
(令人惊讶的是~10%)
正如@Jon在评论中所提到的,X(~nanix)
会将所有非纳米数字拉出作为所需的输入。但是,此操作需要一个占用大量时间的内存副本。更重要的是,由于nan的数量在所有列向量中并不一致,因此很难对该过程进行向量化(必须使用for循环来处理每个列,或者促进更慢的事情,如{{ 1}})。
完整的测试代码:
cellfun
输出:
clear;clc;close all
a = randn(80,3000);
[r,c] = size(a);
b = randperm(r*c);
nanix = b(1:round(numel(b)*0.3)); % randomly select 30% of values to be NaN
a(nanix) = NaN;
nnx = sum(~isnan(a));
clearvars -except a
tic
for i = 1:1e3
X = a;
nanix = isnan(X);
nx = sum(~nanix); % count the number of non-NaN values in each column for calculating mean
X(nanix) = 0;
bsxminus = sum(X)./nx;
X = bsxfun(@minus, X, bsxminus);
X(nanix) = 0;
xnorm = sqrt(sum(X.^2));
end
toc
clearvars -except a xnorm
tic
for i = 1:1e3
X = a;
nanix = isnan(X);
nx = sum(~nanix);
X(nanix) = 0;
Xsum = sum(X);
Xmean = Xsum./nx;
X2 = sum(X.^2);
Xmean2 = Xmean.^2.*nx;
XXmean = Xsum.*Xmean;
xnorm2 = sqrt( X2+Xmean2-XXmean-XXmean ); % avoid bsx
end
toc
norm(abs(xnorm-xnorm2)./xnorm) % relative error
clearvars -except a xnorm
tic
for i = 1:1e3
X = a;
nanix = isnan(X);
nx = sum(~nanix);
X(nanix) = 0;
Xsum = sum(X);
X2sum = sum(X.^2); % 50% time consumed here
xnorm3 = sqrt( X2sum - Xsum.^2./nx ); % avoid bsx
end
toc
norm(abs(xnorm-xnorm3)./xnorm) % relative error
clearvars -except a xnorm
tic
for i = 1:1e3
X = a;
nanix = isnan(X);
X = X(~nanix);
bsxminus = mean(X);
X = bsxfun(@minus, X, bsxminus);
xnorm4 = sqrt(sum(X.^2));
end
toc
norm(abs(xnorm-xnorm4)./xnorm) % I can't think of a working way
可以看出,前两种方法具有相似的速度,而我确实观察到第二种方法的持续时间更短。这是因为缺少计算与Elapsed time is 6.326877 seconds.
Elapsed time is 3.780087 seconds.
ans =
8.8214e-15
Elapsed time is 3.690037 seconds.
ans =
8.8283e-15
Elapsed time is 3.632071 seconds.
ans =
3.0369e+03
>>
相关的事情可能会节省很多时间。
同时,第三个不能给出正确答案;问题出在索引命令mean(X)
答案 1 :(得分:1)
嗯,这就是我的意思,但是@Yvon,你是对的,它比将NaNs
分配给0
要慢。另外,使用TIMEIT函数进行更好的测试(Elkan的回答似乎不对)。我确信有一些小的改进和其他方法,但我想我只是添加它。
Test Results:
func4 %# Jon potential improves on func3
runTime: 0.00374437,
error:5.32907e-15
func3 %# Elkan's comment on Yvon's answer
runTime: 0.00375083,
error:5.32907e-15
func2 %# Expanded Norm Yvon
runTime: 0.00386379,
error:5.32907e-15
func1 %# Original
runTime: 0.00515395,
error:0
func5 %# Jon's no zero-assign version of func3
runTime: 0.00884188,
error:5.32907e-15
func6 %# Jon's no zero-assign version of Original (norm-mean directly)
runTime: 0.0105508,
error:2.66454e-15
代码:
function test()
a = randn(80,3000);
[r,c] = size(a);
b = randperm(r*c);
nanix = b(1:round(numel(b)*0.3)); % randomly select 30% of values to be NaN
a(nanix) = NaN;
% nnx = sum(~isnan(a));
f = {...
@() func1(a); %# Original
@() func2(a); %# Expanded Yvon
@() func3(a); %# Elkan's comment
@() func4(a); %# Jon potential improves on 3
@() func5(a); %# Jon's no zero-assign Elkan's
@() func6(a); %# Jon's no zero-assign on Original
};
%Time each function/method
timings = cellfun(@timeit, f);
validity = cellfun(@feval, f, 'UniformOutput',false);
err=max( abs( bsxfun(@minus,validity{1},vertcat(validity{:})) ),[],2 );
%Display in order of speed:
idxTime=1:size(timings,1);
[sortedTime,sortMap]=sort(timings);
fprintf('Test Results:\n\n');
fprintf('func%d\r runTime: %g,\r error:%g\n',[idxTime(sortMap);sortedTime.';err(sortMap).'])
end
%% Functions below
function xnorm = func1(X)
nanix = isnan(X);
nx = sum(~nanix); % count the number of non-NaN values in each column for calculating mean
X(nanix) = 0;
bsxminus = sum(X)./nx;
X = bsxfun(@minus, X, bsxminus);
X(nanix) = 0;
xnorm = sqrt(sum(X.^2));
end
function xnorm2 = func2(X)
nanix = isnan(X);
nx = sum(~nanix);
X(nanix) = 0;
Xsum = sum(X);
Xmean = Xsum./nx;
X2 = sum(X.^2);
Xmean2 = Xmean.^2.*nx;
XXmean = Xsum.*Xmean;
xnorm2 = sqrt( X2+Xmean2-XXmean-XXmean ); % avoid bsx
end
function xnorm3 = func3(X)
nanix = isnan(X);
nx = sum(~nanix);
X(nanix) = 0;
Xsum = sum(X);
X2sum = sum(X.^2); % 50% time consumed here
xnorm3 = sqrt( X2sum - Xsum.^2./nx ); % avoid bsx
% relative error
end
%Try column version of setting to zero, Replace element-wise powers with multiplication, add realsqrt
function xnorm4 = func4(X)
[r,c] = size(X);
nanix = isnan(X);
nx = sum(~nanix);%prob can't speedup without Mex
Xtemp=X(:);
Xtemp(nanix(:))=0;
X=reshape(Xtemp,r,c); %Sometimes faster if linear?
Xsum = sum(X);
X2sum = sum(X.*X);
xnorm4 = realsqrt( X2sum - (Xsum.*Xsum)./nx ); % avoid bsx
end
%Now using indexing instead of setting to zero
function xnorm5 = func5(X)
nnanix = ~isnan(X);
nx = sum(nnanix);
% Xtemp=X(nnanix); %40% of time lost
Xtemp=X(:);
Xtemp=Xtemp(nnanix(:)); %drops by 20%, still slower than =0
%Now have two vectors, one of values, and one of # of values per
%original column... avoid cells!
ind(1,numel(Xtemp)) = 0;%hack pre-allocation
ind(cumsum(nx(end:-1:1))) = 1;
ind = cumsum(ind(end:-1:1)); %grouping index for accumarray
Xsum = accumarray(ind(:), Xtemp(:)); %sum each "original" columns
X2sum = accumarray(ind(:), Xtemp(:).*Xtemp(:));%./nx(:) for mean directly
xnorm5 = realsqrt( X2sum - (Xsum.*Xsum)./nx(:) ).'; % avoid bsx
end
%Now using indexing instead of setting to zero
function xnorm6 = func6(X)
nnanix = ~isnan(X);
nx = sum(nnanix); %only Mex could speed up
% Xtemp=X(nnanix); %40% of time lost
Xtemp=X(:);
Xtemp=Xtemp(nnanix(:)); %drops by 20%, still slower than =0
numNoNans=numel(Xtemp);
%Now have two vectors, one of values, and one of # of values per
%original column... avoid cells! It's like a "run length encoding"
%Almost like "run-length encoding" FEX: RunLength, for MEX
ind(1,numNoNans) = 0;%hack pre-allocation
ind(cumsum(nx(end:-1:1))) = 1;
ind = cumsum(ind(end:-1:1)); %// generate grouping index for accumarray
Xmean = (accumarray(ind(:), Xtemp(:))./nx(:)).'; %means of each col
%"Run-length decoding" %Repelem is fastest if 2015b>
idx(1,numNoNans)=0;
idx([1 cumsum(nx(1:end-1))+1]) = diff([0 Xmean]);
XmeanRepeated = cumsum(idx);
XminusMean = Xtemp(:).'-XmeanRepeated; %Each col subtracted by col mean
XSumOfSquares = accumarray(ind(:), XminusMean.*XminusMean); %sum each "original" columns
xnorm6 = realsqrt(XSumOfSquares).';
end
答案 2 :(得分:0)
我很惊讶地看到我提出的方式在被@Yvon强制执行时花费了更多时间。所以我做了一些实验并作为新的答案发布。
clear;clc;close all
a = randn(80,3000);
[r,c] = size(a);
b = randperm(r*c);
nanix = b(1:round(numel(b)*0.3)); % randomly select 30% of values to be NaN
a(nanix) = NaN;
% nan values
nanix = isnan(a);
% count the number of non-NaN values in each column for calculating mean
nx = sum(~nanix);
clearvars -except a nanix nx
tic
for i = 1:1e3
X = a;
X(nanix) = 0;
bsxminus = sum(X)./nx;
X = bsxfun(@minus, X, bsxminus);
X(nanix) = 0;
xnorm = sqrt(sum(X.^2));
end
t = toc;
fprintf('Reference edition: time elapese: %.4f\n',t) % relative error
clearvars -except a xnorm nanix nx
tic
for i = 1:1e3
X = a;
X(nanix) = 0;
Xsum = sum(X);
Xmean = Xsum./nx;
X2 = sum(X.^2);
Xmean2 = Xmean.^2.*nx;
XXmean = Xsum.*Xmean;
xnorm2 = sqrt( X2+Xmean2-XXmean-XXmean ); % avoid bsx
end
t = toc;
fprintf('Yvon''s first edition: time elapsed: %.4fs, max error: %e\n',t, max(abs(xnorm-xnorm2))) % relative error
clearvars -except a xnorm nanix nx
tic
for i = 1:1e3
X = a;
X(nanix) = 0;
Xsum = sum(X);
X2sum = sum(X.^2); % 50% time consumed here
xnorm3 = sqrt( X2sum - Xsum.^2./nx ); % avoid bsx
end
t = toc;
fprintf('Yvon''s second edition: time elapsed: %.4fs, max error: %e\n',t, max(abs(xnorm-xnorm3))) % relative error
clearvars -except a xnorm nanix nx
tic
for i = 1:1e3
X = a;
X = X(~nanix);
bsxminus = mean(X);
X = bsxfun(@minus, X, bsxminus);
xnorm4 = sqrt(sum(X.^2));
end
t = toc;
fprintf('Yvon''s third edition: time elapsed: %.4fs, max error: %e\n', t, max(abs(xnorm-xnorm4)))
clearvars -except a xnorm nanix nx
tic
X = a;
X(nanix) = 0;
for i = 1:1e3
xsum = sum(X);
x2sum = sum(X.^2);
xnorm5 = sqrt( x2sum - xsum.^2./nx ); % avoid bsx
end
t = toc;
fprintf('My simplified edition: time elapsed: %.4fs, max error: %e\n',t, max(abs(xnorm-xnorm5)))
clearvars -except a xnorm nanix nx
tic
nanix = find(isnan(a));
for i = 1:1e3
X = a;
X(nanix) = 0;
xsum = sum(X);
x2sum = sum(X.^2);
xnorm6 = sqrt( x2sum - xsum.^2./nx ); % avoid bsx
end
t = toc;
fprintf('Yvon''s edition with linear indices: time elapsed: %.4fs, max error: %e\n',t, max(abs(xnorm-xnorm6)))
clearvars -except a xnorm nanix nx
tic
nanix = find(isnan(a));
for i = 1:1e3
X = a;
X(nanix) = 0;
bsxminus = sum(X)./nx;
X = bsxfun(@minus, X, bsxminus);
X(nanix) = 0;
xnorm7 = sqrt(sum(X.^2));
end
t = toc;
fprintf('Reference edition with linear indices: time elapsed: %.4fs, max error: %e\n',t, max(abs(xnorm-xnorm7)))
输出结果为:
Reference edition: time elapese: 6.1678
Yvon's first edition: time elapsed: 3.6075s, max efrror: 4.440892e-15
Yvon's second edition: time elapsed: 3.5760s, max error: 3.552714e-15
Yvon's third edition: time elapsed: 4.0059s, max error: 4.043873e+02
My simplified edition: time elapsed: 0.8743s, max error: 3.552714e-15
Yvon's edition with linear indices: time elapsed: 2.3531s, max error: 3.552714e-15
Reference edition with linear indices: time elapsed: 3.5527s, max error: 0.000000e+00
使用find
将逻辑索引转换为线性索引可以进一步显着提高速度。由于MATLAB建议使用逻辑索引来提高性能,我想知道为什么这与我的结果相矛盾。