Question

我有一个数据集，它包含24个连续行中按时间顺序排列的数据。我有这个代码来显示索引的位置：

%Re calculate the batch indices due to deletion
indbatch =1;
for it=2:size(Data,1)
    if Data(it,2)<Data(it-1,2)
        indbatch=[indbatch,it];
    end
end
indbatch=[indbatch,it+1]; %gives the ind of start of each batch

我试图查看每批中缺失数据（NaN）值的百分比是否过高，以便删除批次。为了简化第一批，来自行1：250，然后是第二批行251：510。所以我想知道如果编码循环来计算每批中NaN的百分比，如果百分比大于80％记录批号，为了删除这是我到目前为止所做的但是由于我认为的Ind位不起作用，并且百分比位只使用长度并且应该使用行*长度......

for ib=1:length(indbatch)-1       %each batch (24 batches)
tspan=[indbatch(ib):indbatch(ib+1)-1]; % gives the time span of each batch

for iv = 1:49
    Ind = find(isnan(Data(tspan,iv)));
    Check = isempty(Ind);
    if Check == 1 
        continue
    else
        Percentage_Missing = (length(Ind)/ length(Data)) * 100;
        if Percentage_Missing >= 80
            Delete = [Delete, iv];
        else
            continue
        end
    end
end
end

Answer 1

如果您要从Data中删除超过80％的值为NaN的行，则可以通过一些逻辑矩阵操作来执行此操作。

%// Find all NaN values
isMissing = isnan(Data(tSpan, :));

%// Compute the number of NaNs for each column of Data
nMissing = sum(isMissing, 1);

%// Compute this as a percentage
percentMissing = nMissing ./ size(isMissing, 1);

%// Now figure out which columns to remove
toRemove = percentMissing >= 0.80;

%// And use this to index into the data
notMissingData = Data(tSpan, ~toRemove);

Matlab如果NaN百分比过高则删除数据

1 个答案: