我正在使用matlab,我有一个(60x882)矩阵,我需要计算列之间的成对相关性。但是我想忽略所有具有NaN或更多NaN的列(即任何一对列的结果,其中至少有一个条目是NaN应该是NaN)。
到目前为止,这是我的代码:
for i=1:size(auxret,2)
for j=1:size(auxret,2)
rho(i,j)=corr(auxret(:,i),auxret(:,j));
end
end
end
但这是非常无效的。我考虑过使用这个功能:
corr(奥黛尔,'行','成对'); 但它没有产生相同的结果(它忽略了NaN但仍然计算相关性 - 所以除非一列的所有条目都是NaN,否则它仍然会给出一个输出)。
有关如何提高效率的任何建议?
答案 0 :(得分:2)
要使用corr(auxret, 'rows','pairwise')
获取与代码相同的输出,以下操作
auxret(:,any(isnan(auxret))) = NaN;
r = corr(auxret, 'rows','pairwise');
答案 1 :(得分:0)
这是一种有效的方法,特别是在处理涉及NaNs
-
%// Get mask of invalid columns and thus extract columns without any NaN
mask = any(isnan(auxret),1);
A = auxret(:,~mask);
%// Use correlation formula to get correlation outputs for valid columns
n = size(A,1);
sum_cols = sum(A,1);
sumsq_sqcolsum = n*sum(A.^2,1) - sum_cols.^2;
val1 = n.*(A.'*A) - bsxfun(@times,sum_cols.',sum_cols); %//'
val2 = sqrt(bsxfun(@times,sumsq_sqcolsum.',sumsq_sqcolsum)); %//'
valid_outvals = val1./val2;
%// Setup output array and store the valid outputs in it
ncols = size(auxret,2);
valid_idx = find(~mask);
out = nan(ncols);
out(valid_idx,valid_idx) = valid_outvals;
基本上,作为预处理步骤,它完全删除具有一个或多个NaNs
的所有列并计算相关输出。然后我们使用适当大小初始化NaNs
的输出数组,并在适当的位置将有效输出放回其中。
无论您是使用循环方法还是使用可选corr(auxret, 'rows','pairwise')
,结果似乎都是有效的。但是,这里有一个很大的问题:即使只有一个NaN
在任何一列中都会使性能降低很多,并且这种性能下降对于原始的loopy方法来说是巨大的,并且我们将使用rows + pairwise
选项仍然很大
然后在基准测试结果中找到答案。
基准代码
nrows = 60;
ncols = 882;
percent_nans = 1; %// decides the percentage of NaNs in input
auxret = rand(nrows,ncols);
auxret(randperm(numel(auxret),round((percent_nans/100)*numel(auxret))))=nan;
disp('------------------------------- With Proposed Approach')
tic
%// Solution code from earlier
toc
disp('------------------------------- With ROWS + PAIRWISE Approach')
tic
auxret(:,any(isnan(auxret))) = NaN;
out1 = corr(auxret, 'rows','pairwise');
toc
disp('------------------------------- With Original Loopy Approach')
tic
out2 = zeros(size(auxret,2));
for i=1:size(auxret,2)
for j=1:size(auxret,2)
out2(i,j)=corr(auxret(:,i),auxret(:,j));
end
end
toc
因此,根据输入数据和NaNs
的百分比,很少有可能的情况,相应地我们有运行时结果 -
案例1:输入为6 x 88
,NaNs的百分比为10
------------------------------- With Proposed Approach
Elapsed time is 0.006371 seconds.
------------------------------- With ROWS + PAIRWISE Approach
Elapsed time is 0.052563 seconds.
------------------------------- With Original Loopy Approach
Elapsed time is 0.875620 seconds.
案例2:输入为6 x 88
,NaNs的百分比为1
------------------------------- With Proposed Approach
Elapsed time is 0.006303 seconds.
------------------------------- With ROWS + PAIRWISE Approach
Elapsed time is 0.049194 seconds.
------------------------------- With Original Loopy Approach
Elapsed time is 0.871369 seconds.
案例3:输入为6 x 88
,NaNs的百分比为0.001
------------------------------- With Proposed Approach
Elapsed time is 0.006738 seconds.
------------------------------- With ROWS + PAIRWISE Approach
Elapsed time is 0.025754 seconds.
------------------------------- With Original Loopy Approach
Elapsed time is 0.867647 seconds.
案例4:输入为60 x 882
,NaNs的百分比为10
------------------------------- With Proposed Approach
Elapsed time is 0.007766 seconds.
------------------------------- With ROWS + PAIRWISE Approach
Elapsed time is 2.479645 seconds.
------------------------------- With Original Loopy Approach
...... Taken Too long ...
案例5:输入为60 x 882
,NaNs的百分比为1
------------------------------- With Proposed Approach
Elapsed time is 0.014144 seconds.
------------------------------- With ROWS + PAIRWISE Approach
Elapsed time is 2.324878 seconds.
------------------------------- With Original Loopy Approach
...... Taken Too long ...
案例6:输入为60 x 882
,NaNs的百分比为0.001
------------------------------- With Proposed Approach
Elapsed time is 0.020410 seconds.
------------------------------- With ROWS + PAIRWISE Approach
Elapsed time is 1.830632 seconds.
------------------------------- With Original Loopy Approach
...... Taken Too long ...
答案 2 :(得分:0)
您所描述的是corr
的默认行为,没有任何特殊选项。例如,
auxret = [8 2 3
3 5 NaN
7 10 3
7 4 6
2 6 7];
rho = corr(auxret)
结果
rho =
1.0000 -0.1497 NaN
-0.1497 1.0000 NaN
NaN NaN NaN