只用最后5次观察而不是全部观察

时间:2017-10-31 12:41:32

标签: matlab

我正在使用Matlab的fillmissing function填充缺失的值。

如果您的矩阵看起来如下:

A = rand(10,2);
A(end-5:end,1) = NaN;

% this gives: 
A =

    0.8147    0.1576
    0.9058    0.9706
    0.1270    0.9572
    0.9134    0.4854
       NaN    0.8003
       NaN    0.1419
       NaN    0.4218
       NaN    0.9157
       NaN    0.7922
       NaN    0.9595

您可以按如下方式应用函数fillmissing:

Afilled = fillmissing(A, 'previous')

然后相应的矩阵将如下所示:

Afilled =

    0.8147    0.1576
    0.9058    0.9706
    0.1270    0.9572
    0.9134    0.4854
    0.9134    0.8003
    0.9134    0.1419
    0.9134    0.4218
    0.9134    0.9157
    0.9134    0.7922
    0.9134    0.9595

然而,现在,该功能没有考虑到实际丢失了多少观察数(在这种情况下为6)。

我正在寻找一种方法,在取最后一个值之前考虑观察次数。例如,仅根据最近5次观察填写缺失的观测值:

Afilled2 =

    i=1               0.8147    0.1576
    i=2               0.9058    0.9706
    i=3               0.1270    0.9572
    i=4               0.9134    0.4854
    i=5 % missing 1   0.9134    0.8003
    i=6 % missing 2   0.9134    0.1419
    i=7 % missing 3   0.9134    0.4218
    i=8 % missing 4   0.9134    0.9157
    i=9 % missing 5   0.9134    0.7922
    i=10              NaN       0.9595

1 个答案:

答案 0 :(得分:1)

MATLAB的fillmissing函数没有此功能。下面是一些简单的代码来执行您想要执行的操作(使用'previous'方法填充维度1):

% parameter: maximum number of observations to fill with a given value
max_fill_obs = 5;

% loop over columns
for col = 1 : size(A, 2)

    % initialize a counter (the number of previously filled values) to 0
    counter = 0;

    % loop over rows within column col, starting from the second row
    for row = 2 : size(A, 1)

        % if the current element is known, reset the counter to 0
        if ~isnan(A(row, col))
            counter = 0;

        % otherwise, if we haven't already filled in max_fill_obs values,
        % fill in the value and increment the counter
        elseif counter < max_fill_obs
            A(row, col) = A(row - 1, col);
            counter = counter + 1;
        end

    end
end

如果有多个NaN值块,只填充每个块中的第一个max_fill_obs值,则此方法有效。例如,尝试在

定义的矩阵上运行它
A = rand(20,2);
A(5:10,1) = NaN;
A(13:19,1) = NaN;

这是上述代码的矢量化版本:

Afilled = fillmissing(A, 'previous');
Afilled(movsum(isnan(A), [max_fill_obs, 0]) > max_fill_obs) = NaN;