Question

基本上，我有一个由NaN代表的许多“漏洞”的数据矩阵，我想在单个列中检索所有NaN集群的索引 less 4次。< / p>

e.g。矩阵：

A = 
    23    12    NaN   56    60    21    NaN
    60    56    94    22    45    NaN   NaN
    23    55    19    83    NaN   NaN   NaN
    NaN   NaN   NaN   NaN   NaN   NaN   NaN
    NaN   NaN   NaN   NaN   NaN   NaN   NaN
    NaN   NaN   NaN   NaN   NaN   NaN   NaN
    84    99    43    32    89    12    NaN
    76    92    73    47    22    12    10
    23    55    12    93    61    94    20
    NaN   NaN   NaN   NaN   NaN   NaN   NaN
    41    16    83    39    82    37    43
    14    78    92    40    81    29    60

它会返回：

ans = 
    [4; 5; 6; 10; 16; 17; 18; 22; 25; 28; 29; 30; 34; 40; 41; 42; 46; 58; 70; 82]

到目前为止，我有一个带有

所有NaN值的索引的向量

nan_list=find(isnan(A(:)))

但我不知道如何在不使用循环的情况下从该向量中提取序列号，这将太昂贵。我也尝试了类似于b3 here发布的答案，将所有NaN切换为矩阵中没有出现的值，但该代码不能转移到其他数据集。

感谢您的任何建议！

Answer 1

<强>代码

N = 4; %// Fewer than clusters of N or N+ NaNs are to be detecteed
nan_pos = isnan(A) %// Find NaN positions as a binary array
conv_res = conv2(double(nan_pos),[0 ones(1,N)]')==N %//' Perform convolution
start_ind = find(conv_res(N+1:end,:)) %// Find positions where clusters of N or N+ NaNs start
nan_pos(unique(bsxfun(@plus,start_ind,[0:N-1])))=0 %// Get positions of all those clustered N or N+ NaNs and set them in NaN position array as zeros
out = find(nan_pos) %// Finally the desired output

示例

作为一个例子，让我们尝试使用稍微不同的输入代码，希望能够测试出问题的各个方面 -

A = [ 23 12 NaN 56 60 21 NaN 60 56 94 22 45 NaN NaN 23 55 19 83 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 84 99 43 32 89 12 NaN 76 92 73 47 22 12 10 23 55 12 93 61 94 20 NaN NaN NaN NaN NaN NaN NaN 41 NaN NaN 39 82 37 43 14 78 NaN 40 81 NaN 60]

现在，让我们假设我们正在寻找小于3 NaNs的群集索引。因此，在代码中将N编辑为3，输出为 -

out = 10 22 23 25 46 58 70 72 82

当我们查看输入时，这是有道理的。

Answer 2

这应该有效：

[rows, ~] = size(A);
maxNansPerCol = 4;

% find which columns have few enough NaNs
Anans = isnan(A);
nansInCols = sum(Anans);
qualifyingCols = nansInCols <= maxNansPerCol;

% zero the other columns
mask = repmat(qualifyingCols,rows,1);
B = Anans .* mask;

% get the NaN locations
indices = find(B(:));

（如果有些事情稍微偏离，我会道歉 - 我不会在这台计算机上安装MATLAB来测试它）

Matlab：找到没有循环的NaN的相邻实例

2 个答案: