R:na.locf填充timeSeries / xts,尊重前导和尾随NA

时间:2017-01-26 16:05:01

标签: r xts fill na locf

首先我创建的解决方案 - 我是一个新手,所以欢迎所有帮助改善我的功能:

library(xts)
library(timeSeries)



#' Fills the gaps of timeseries (timeSeries, xts) - ignores trailing and leading NAs.
#'
#' Suppose a timeseries object that looks as follows:
#'
#' 2017-01-01    1   10   NA    NA     NA  100000  1000000        NA
#' 2017-01-02    2   NA  200  2000  20000  200000  2000000        NA
#' 2017-01-03    3   NA  300  3000     NA      NA  3000000  30000000
#' 2017-01-04    4   40  400  4000  40000  400000  4000000        NA
#' 2017-01-05    5   50  500    NA     NA      NA       NA        NA
#'
#' Leading and trailing NAs will stay in place, whereas NAs
#' within the "data section" should be written forward (na.locf). 
#' The result of the function call would be:
#'
#' 2017-01-01    1   10   NA    NA     NA  100000  1000000        NA
#' 2017-01-02    2   10  200  2000  20000  200000  2000000        NA
#' 2017-01-03    3   10  300  3000  20000  200000  3000000  30000000
#' 2017-01-04    4   40  400  4000  40000  400000  4000000        NA
#' 2017-01-05    5   50  500    NA     NA      NA       NA        NA
#'
#' @param ts_obj xts or timeSeries object to fill
#'
#' @return timeSeries, xts - depending on handed in type
#' @export
#'
#' @examples
#' library(xts)
#' library(timeSeries)
#' test_matrix <- cbind(
#' c(       1,        2,        3,        4,        5),
#' c(      10,       NA,       NA,       40,       50),
#' c(      NA,      200,      300,      400,      500),
#' c(      NA,     2000,     3000,     4000,       NA),
#' c(      NA,    20000,       NA,    40000,       NA),
#' c(  100000,   200000,       NA,   400000,       NA),
#' c( 1000000,  2000000,  3000000,  4000000,       NA),
#' c(      NA,       NA, 30000000,       NA,       NA)
#' )
#' dates <- as.Date('2017-01-01') + 0:4
#' test_xts <- xts(test_matrix, dates)
#' print(test_xts)
#' print(fill_ts(test_xts))
#'
#' test_ts = as.timeSeries(test_xts)
#' print(fill_ts(test_ts))
fill_ts <- function(ts_obj) {

  # Fill from first date --> the FIRST dates will remain NA (if they were NA)
  filled_from_first_date <- na.locf(ts_obj, fromLast=FALSE)
  # Fill from last date --> the LAST dates will remain NA (if they were NA)
  filled_from_last_date <- na.locf(ts_obj, fromLast=TRUE)

  # replace value with NA if NA is found in one of the filled timeseries
  filled_from_first_date[is.na(filled_from_first_date) | is.na(filled_from_last_date)] <- NA
  return(filled_from_first_date)
}

test_matrix <- cbind(
  c(       1,        2,        3,        4,        5),
  c(      10,       NA,       NA,       40,       50),
  c(      NA,      200,      300,      400,      500),
  c(      NA,     2000,     3000,     4000,       NA),
  c(      NA,    20000,       NA,    40000,       NA),
  c(  100000,   200000,       NA,   400000,       NA),
  c( 1000000,  2000000,  3000000,  4000000,       NA),
  c(      NA,       NA, 30000000,       NA,       NA)
)
dates <- as.Date('2017-01-01') + 0:4
test_xts <- xts(test_matrix, dates)
print(test_xts)
result = fill_ts(test_xts)
print(result)

test_ts = as.timeSeries(test_xts)
result = fill_ts(test_ts)
print(result)

此函数填充(xts,timeSeries)时间序列并忽略尾随和前导NA。该功能甚至相当快 - 但仍然:这是一个标准问题,我相信,有一个标准(并希望更有效)的解决方案,我没有找到。

很抱歉,如果这个问题被问到并回答了1000x ...我在stackoverflow上找不到合适的条目。

0 个答案:

没有答案