Question

我正在使用Matlab的fillmissing function填充缺失的值。

如果您的矩阵看起来如下：

A = rand(10,2);
A(end-5:end,1) = NaN;

% this gives: 
A =

    0.8147    0.1576
    0.9058    0.9706
    0.1270    0.9572
    0.9134    0.4854
       NaN    0.8003
       NaN    0.1419
       NaN    0.4218
       NaN    0.9157
       NaN    0.7922
       NaN    0.9595

您可以按如下方式应用函数fillmissing：

Afilled = fillmissing(A, 'previous')

然后相应的矩阵将如下所示：

Afilled =

    0.8147    0.1576
    0.9058    0.9706
    0.1270    0.9572
    0.9134    0.4854
    0.9134    0.8003
    0.9134    0.1419
    0.9134    0.4218
    0.9134    0.9157
    0.9134    0.7922
    0.9134    0.9595

然而，现在，该功能没有考虑到实际丢失了多少观察数（在这种情况下为6）。

我正在寻找一种方法，在取最后一个值之前考虑观察次数。例如，仅根据最近5次观察填写缺失的观测值：

Afilled2 =

    i=1               0.8147    0.1576
    i=2               0.9058    0.9706
    i=3               0.1270    0.9572
    i=4               0.9134    0.4854
    i=5 % missing 1   0.9134    0.8003
    i=6 % missing 2   0.9134    0.1419
    i=7 % missing 3   0.9134    0.4218
    i=8 % missing 4   0.9134    0.9157
    i=9 % missing 5   0.9134    0.7922
    i=10              NaN       0.9595

Answer 1

MATLAB的fillmissing函数没有此功能。下面是一些简单的代码来执行您想要执行的操作（使用'previous'方法填充维度1）：

% parameter: maximum number of observations to fill with a given value
max_fill_obs = 5;

% loop over columns
for col = 1 : size(A, 2)

    % initialize a counter (the number of previously filled values) to 0
    counter = 0;

    % loop over rows within column col, starting from the second row
    for row = 2 : size(A, 1)

        % if the current element is known, reset the counter to 0
        if ~isnan(A(row, col))
            counter = 0;

        % otherwise, if we haven't already filled in max_fill_obs values,
        % fill in the value and increment the counter
        elseif counter < max_fill_obs
            A(row, col) = A(row - 1, col);
            counter = counter + 1;
        end

    end
end

如果有多个NaN值块，只填充每个块中的第一个max_fill_obs值，则此方法有效。例如，尝试在

定义的矩阵上运行它

A = rand(20,2);
A(5:10,1) = NaN;
A(13:19,1) = NaN;

这是上述代码的矢量化版本：

Afilled = fillmissing(A, 'previous');
Afilled(movsum(isnan(A), [max_fill_obs, 0]) > max_fill_obs) = NaN;

只用最后5次观察而不是全部观察

1 个答案: