Matlab如果NaN百分比过高则删除数据

时间:2016-04-16 07:03:09

标签: matlab for-loop nan percentage

我有一个数据集,它包含24个连续行中按时间顺序排列的数据。我有这个代码来显示索引的位置:

%Re calculate the batch indices due to deletion
indbatch =1;
for it=2:size(Data,1)
    if Data(it,2)<Data(it-1,2)
        indbatch=[indbatch,it];
    end
end
indbatch=[indbatch,it+1]; %gives the ind of start of each batch

我试图查看每批中缺失数据(NaN)值的百分比是否过高,以便删除批次。为了简化第一批,来自行1:250,然后是第二批行251:510。所以我想知道如果编码循环来计算每批中NaN的百分比,如果百分比大于80%记录批号,为了删除 这是我到目前为止所做的但是由于我认为的Ind位不起作用,并且百分比位只使用长度并且应该使用行*长度......

for ib=1:length(indbatch)-1       %each batch (24 batches)
tspan=[indbatch(ib):indbatch(ib+1)-1]; % gives the time span of each batch

for iv = 1:49
    Ind = find(isnan(Data(tspan,iv)));
    Check = isempty(Ind);
    if Check == 1 
        continue
    else
        Percentage_Missing = (length(Ind)/ length(Data)) * 100;
        if Percentage_Missing >= 80
            Delete = [Delete, iv];
        else
            continue
        end
    end
end
end 

1 个答案:

答案 0 :(得分:0)

如果您要从Data中删除超过80%的值为NaN的行,则可以通过一些逻辑矩阵操作来执行此操作。

%// Find all NaN values
isMissing = isnan(Data(tSpan, :));

%// Compute the number of NaNs for each column of Data
nMissing = sum(isMissing, 1);

%// Compute this as a percentage
percentMissing = nMissing ./ size(isMissing, 1);

%// Now figure out which columns to remove
toRemove = percentMissing >= 0.80;

%// And use this to index into the data
notMissingData = Data(tSpan, ~toRemove);