Question

全部我有一个包含大量连续NA的大型数据集，是否有任何快速的方法可以用列替换前一个和下一个非缺失值的平均值？非常感谢娄

Answer 1

有趣的问题......如果你只清楚地解释了你想要的东西。也许是这个？

data = [1 3 NaN 7 6 NaN NaN 2].'; %'// example data: column vector
isn = isnan(data); %// determine which values are NaN
inum = find(~isn); %// indices of numbers
inan = find(isn); %// indices of NaNs
comp = bsxfun(@lt,inan.',inum); %'// for each (number,NaN): 1 if NaN precedes num
[~, upper] = max(comp); %// next number to each NaN (max finds *first* maximum)
data(isn) = (data(inum(upper))+data(inum(upper-1)))/2; %// fill with average

在此示例中：原始数据：

>> data.'
ans =
     1     3   NaN     7     6   NaN   NaN     2

结果：

>> data.'
ans =
     1     3     5     7     6     4     4     2

如果您有一个2D数组并希望按列工作，那么for循环列可能是最佳选择。

当然，如果列的开头或结尾可能有NaN，则问题未定义。

Answer 2

假设NaN不在任何列的第一行/最后一行，我就是这样做的：

（如果有多个连续的NaN，它会搜索之前的下一个非缺失值并对它们求平均值。）

% Creating A

A=magic(7);
newA=A;  %Result will be in newA
A(3,4)=NaN;
A(2,1)=NaN;
A(5,6)=NaN;
A(6,6)=NaN;
A(4,6)=NaN;

% Finding NaN position and calculating positions where we have to average numbers
ind=find(isnan(A));
otherInd=setdiff(1:numel(A(:)),ind);
for i=1:size(ind,1)
   temp=otherInd(otherInd<ind(i));
   prevInd(i,1)=temp(end);
   temp=otherInd(otherInd>ind(i));
   nextInd(i,1)=temp(1);
end

% For faster processing purposes

allInd(1:2:2*length(prevInd))=prevInd;
allInd(2:2:2*length(prevInd))=nextInd;
fun=@(block_struct) mean(block_struct.data)
prevNextNums=A(allInd);
A
newA(ind)=blockproc(prevNextNums,[1 2],fun)

%-----------------------Answer--------------------------
A =

30    39    48     1    10    19    28
NaN    47     7     9    18    27    29
46     6     8   NaN    26    35    37
 5    14    16    25    34   NaN    45
13    15    24    33    42   NaN     4
21    23    32    41    43   NaN    12
22    31    40    49     2    11    20

newA =

30    39    48     1    10    19    28
38    47     7     9    18    27    29
46     6     8    17    26    35    37
 5    14    16    25    34    23    45
13    15    24    33    42    23     4
21    23    32    41    43    23    12
22    31    40    49     2    11    20

Matlab用前一个和下一个非缺失值的平均值替换nan

2 个答案: