Question

我在foll中有一个样本矩阵。格式：。第一栏显示年份和年份第二个值。每年的价值对应于每个月，因此有12个数量的1926年每一行。目录中有1000个这样的文本文件，每个文件包含从1926年开始到2013年的年份值。出于示例目的，我在这里限制了1927年的价值。在矩阵A中，我需要检查Nan（第二列）的每年值，如果在给定的一年中超过6 Nans，我需要拒绝那个站，或者如果不到这个就接受那个站。任何人都可以建议一些简单的算法如何检查每行是否包含完整的12个月或者是否超过6个缺失值？例如，在Matrix A 1926年有7个缺失值，然后检查1927年，依此类推到2013年。

A = [1926   NaN
     1926   Nan
     1926   Nan
     1926   90.424
     1926   127.762
     1926   172.212
     1926   Nan
     1926   Nan 
     1926   Nan
     1926   Nan
     1926   82.296
     1926   89.916
     1927   25.146
     1927   233.68
     1927   127.254
     1927   22.606
     1927   57.15
     1927   185.674
     1927   112.776
     1927   178.562
     1927   110.998
     1927   80.264
     1927   142.24
     1927   237.998
      :        :
     2013     : ]

Answer 1

您可以使用histc，unique和accumarray来解决您的问题 -

%// Id each year and find unique year entries
[unqA,~,year_id] = unique(A(:,1)); 

%// Find the two outputs of whether there are 12 months data and 
%// more than 6 NaNs per year as per the problem requirements
out1 = histc(A(:,1),unqA)==12
out2 = accumarray(year_id,isnan(A(:,2)))>6

如何解释输出 -

out1和out2是长度等于第一列中年份条目的唯一年数的逻辑向量（1和0）。 1中的out1表示该特定年份有12个月的条目。 1中的out2表示当年的6个条目超过NaNs个月，因此您需要拒绝该解决方案。

因此，如果您想要考虑或仅选择与年度数据12 months且每年不超过6 NaNs的年份相对应的行条目，则可以执行此操作 -

Aout = A(ismember(year_id,find(out1)) & ismember(year_id,find(~out2)),:)

如果您希望填写NaN个6 NaNs个A_filled = knnimpute(A) %// knnimputed values for the entire array, A out3 = accumarray(year_id,isnan(A(:,2)))<6 fillpos = ismember(year_id,find(out3)) A(fillpos,:) = A_filled(fillpos,:)个{{1}}个值，那么您可以这样做 -

{{1}}

检查数据差距MATLAB

1 个答案: