全部 我有一个包含大量连续NA的大型数据集,是否有任何快速的方法可以用列替换前一个和下一个非缺失值的平均值? 非常感谢 娄
答案 0 :(得分:3)
有趣的问题......如果你只清楚地解释了你想要的东西。也许是这个?
data = [1 3 NaN 7 6 NaN NaN 2].'; %'// example data: column vector
isn = isnan(data); %// determine which values are NaN
inum = find(~isn); %// indices of numbers
inan = find(isn); %// indices of NaNs
comp = bsxfun(@lt,inan.',inum); %'// for each (number,NaN): 1 if NaN precedes num
[~, upper] = max(comp); %// next number to each NaN (max finds *first* maximum)
data(isn) = (data(inum(upper))+data(inum(upper-1)))/2; %// fill with average
在此示例中:原始数据:
>> data.'
ans =
1 3 NaN 7 6 NaN NaN 2
结果:
>> data.'
ans =
1 3 5 7 6 4 4 2
如果您有一个2D数组并希望按列工作,那么for
循环列可能是最佳选择。
当然,如果列的开头或结尾可能有NaN
,则问题未定义。
答案 1 :(得分:2)
假设NaN不在任何列的第一行/最后一行,我就是这样做的:
(如果有多个连续的NaN,它会搜索之前的下一个非缺失值并对它们求平均值。)
% Creating A
A=magic(7);
newA=A; %Result will be in newA
A(3,4)=NaN;
A(2,1)=NaN;
A(5,6)=NaN;
A(6,6)=NaN;
A(4,6)=NaN;
% Finding NaN position and calculating positions where we have to average numbers
ind=find(isnan(A));
otherInd=setdiff(1:numel(A(:)),ind);
for i=1:size(ind,1)
temp=otherInd(otherInd<ind(i));
prevInd(i,1)=temp(end);
temp=otherInd(otherInd>ind(i));
nextInd(i,1)=temp(1);
end
% For faster processing purposes
allInd(1:2:2*length(prevInd))=prevInd;
allInd(2:2:2*length(prevInd))=nextInd;
fun=@(block_struct) mean(block_struct.data)
prevNextNums=A(allInd);
A
newA(ind)=blockproc(prevNextNums,[1 2],fun)
%-----------------------Answer--------------------------
A =
30 39 48 1 10 19 28
NaN 47 7 9 18 27 29
46 6 8 NaN 26 35 37
5 14 16 25 34 NaN 45
13 15 24 33 42 NaN 4
21 23 32 41 43 NaN 12
22 31 40 49 2 11 20
newA =
30 39 48 1 10 19 28
38 47 7 9 18 27 29
46 6 8 17 26 35 37
5 14 16 25 34 23 45
13 15 24 33 42 23 4
21 23 32 41 43 23 12
22 31 40 49 2 11 20