如何使用每列的先前值填充缺失的信息?
Date.end Date.beg Pollster Serra.PSDB
2012-06-26 2012-06-25 Datafolha 31.0
2012-06-27 <NA> <NA> NA
2012-06-28 <NA> <NA> NA
2012-06-29 <NA> <NA> NA
2012-06-30 <NA> <NA> NA
2012-07-01 <NA> <NA> NA
2012-07-02 <NA> <NA> NA
2012-07-03 <NA> <NA> NA
2012-07-04 <NA> Ibope 22
2012-07-05 <NA> <NA> NA
2012-07-06 <NA> <NA> NA
2012-07-07 <NA> <NA> NA
2012-07-08 <NA> <NA> NA
2012-07-09 <NA> <NA> NA
2012-07-10 <NA> <NA> NA
2012-07-11 <NA> <NA> NA
2012-07-12 2012-07-09 Veritá 31.4
答案 0 :(得分:2)
我不确定这是否是最好的方法。可能有一些包具有完全相同的功能。以下方法可能不是性能最佳的方法,但它确实有效,对于中小型数据集应该没问题。我会谨慎地将它应用于非常大的数据集(超过一百万行或类似的东西)
fillNAByPreviousData <- function(column) {
# At first we find out which columns contain NAs
navals <- which(is.na(column))
# and which columns are filled with data.
filledvals <- which(! is.na(column))
# If there would be no NAs following each other, navals-1 would give the
# entries we need. In our case, however, we have to find the last column filled for
# each value of NA. We may do this using the following sapply trick:
fillup <- sapply(navals, function(x) max(filledvals[filledvals < x]))
# And finally replace the NAs with our data.
column[navals] <- column[fillup]
column
}
以下是使用测试数据集的一些示例:
set.seed(123)
test <- 1:20
test[floor(runif(5,1, 20))] <- NA
> test
[1] 1 2 3 4 5 NA 7 NA 9 10 11 12 13 14 NA 16 NA NA 19 20
> fillNAByPreviousData(test)
[1] 1 2 3 4 5 5 7 7 9 10 11 12 13 14 14 16 16 16 19 20