如何使用每列的先前值填充缺失的信息?

时间:2012-11-22 19:18:33

标签: r

  

可能重复:
  Replacing NAs with latest non-NA value

如何使用每列的先前值填充缺失的信息?

Date.end   Date.beg   Pollster Serra.PSDB
2012-06-26 2012-06-25  Datafolha       31.0
2012-06-27       <NA>       <NA>         NA
2012-06-28       <NA>       <NA>         NA
2012-06-29       <NA>       <NA>         NA 
2012-06-30       <NA>       <NA>         NA
2012-07-01       <NA>       <NA>         NA
2012-07-02       <NA>       <NA>         NA
2012-07-03       <NA>       <NA>         NA
2012-07-04       <NA>       Ibope        22
2012-07-05       <NA>       <NA>         NA
2012-07-06       <NA>       <NA>         NA
2012-07-07       <NA>       <NA>         NA
2012-07-08       <NA>       <NA>         NA
2012-07-09       <NA>       <NA>         NA
2012-07-10       <NA>       <NA>         NA
2012-07-11       <NA>       <NA>         NA
2012-07-12 2012-07-09     Veritá       31.4

1 个答案:

答案 0 :(得分:2)

我不确定这是否是最好的方法。可能有一些包具有完全相同的功能。以下方法可能不是性能最佳的方法,但它确实有效,对于中小型数据集应该没问题。我会谨慎地将它应用于非常大的数据集(超过一百万行或类似的东西)

fillNAByPreviousData <- function(column) {
    # At first we find out which columns contain NAs
    navals <- which(is.na(column))
    # and which columns are filled with data.
    filledvals <- which(! is.na(column))

    # If there would be no NAs following each other, navals-1 would give the
    # entries we need. In our case, however, we have to find the last column filled for
    # each value of NA. We may do this using the following sapply trick:
    fillup <- sapply(navals, function(x) max(filledvals[filledvals < x]))

    # And finally replace the NAs with our data.
    column[navals] <- column[fillup]
    column
}

以下是使用测试数据集的一些示例:

set.seed(123)
test <- 1:20
test[floor(runif(5,1, 20))] <- NA

> test
 [1]  1  2  3  4  5 NA  7 NA  9 10 11 12 13 14 NA 16 NA NA 19 20

> fillNAByPreviousData(test)
 [1]  1  2  3  4  5  5  7  7  9 10 11 12 13 14 14 16 16 16 19 20