我有一个数据集,它包含24个连续行中按时间顺序排列的数据。我有这个代码来显示索引的位置:
%Re calculate the batch indices due to deletion
indbatch =1;
for it=2:size(Data,1)
if Data(it,2)<Data(it-1,2)
indbatch=[indbatch,it];
end
end
indbatch=[indbatch,it+1]; %gives the ind of start of each batch
我试图查看每批中缺失数据(NaN)值的百分比是否过高,以便删除批次。为了简化第一批,来自行1:250,然后是第二批行251:510。所以我想知道如果编码循环来计算每批中NaN的百分比,如果百分比大于80%记录批号,为了删除 这是我到目前为止所做的但是由于我认为的Ind位不起作用,并且百分比位只使用长度并且应该使用行*长度......
for ib=1:length(indbatch)-1 %each batch (24 batches)
tspan=[indbatch(ib):indbatch(ib+1)-1]; % gives the time span of each batch
for iv = 1:49
Ind = find(isnan(Data(tspan,iv)));
Check = isempty(Ind);
if Check == 1
continue
else
Percentage_Missing = (length(Ind)/ length(Data)) * 100;
if Percentage_Missing >= 80
Delete = [Delete, iv];
else
continue
end
end
end
end
答案 0 :(得分:0)
如果您要从Data
中删除超过80%的值为NaN的行,则可以通过一些逻辑矩阵操作来执行此操作。
%// Find all NaN values
isMissing = isnan(Data(tSpan, :));
%// Compute the number of NaNs for each column of Data
nMissing = sum(isMissing, 1);
%// Compute this as a percentage
percentMissing = nMissing ./ size(isMissing, 1);
%// Now figure out which columns to remove
toRemove = percentMissing >= 0.80;
%// And use this to index into the data
notMissingData = Data(tSpan, ~toRemove);