修改动物园库中的.fill_short_gaps函数

时间:2014-12-03 15:50:28

标签: r zoo

我想修改R zoo库中的.fill_short_gaps函数的maxgaps参数的效果(在na.locf和na.approx中使用),如uday&comment; {{{{{{ 3}}

以下示例说明了na.locf上下文中的现有行为。

x <- c(rep(NA, 2), 1:4, rep(NA, 4), 7:8, rep(NA, 2), 9:10)
y <- na.locf(x, na.rm=FALSE, maxgap=2)
rbind(x, y)

结果是

   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16]

x    NA   NA    1    2    3    4   NA   NA   NA    NA     7     8    NA    NA     9    10

y    NA   NA    1    2    3    4   NA   NA   NA    NA     7     8     8     8     9    10

但是,我希望7:10中的四个内部NAs组合填充maxgap值,其余为NA。 E.g:

    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16]
x   NA   NA    1    2    3    4   NA   NA   NA    NA     7     8    NA    NA     9    10
z   NA   NA    1    2    3    4    4    4   NA    NA     7     8     8     8     9    10

作为参考,这里是动物园包中的here

## x = series with gaps
## fill = same series with filled gaps
.fill_short_gaps <- function(x, fill, maxgap) {
    if (maxgap <= 0)
        return(x)
    if (maxgap >= length(x))
        return(fill)
    naruns <- rle(is.na(x))

    # This is the part I want to modify.  Currently sets all runs > maxgap
    # to FALSE (meaning don't fill)
    naruns$values[naruns$lengths > maxgap] <- FALSE

    naok <- inverse.rle(naruns)
    ifelse(naok, fill, x)
}

x为例,naruns如下所示:

naruns

Run Length Encoding
    lengths: int [1:6] 2 4 4 2 2 2
    values : logi [1:6] TRUE FALSE TRUE FALSE TRUE FALSE

解决我的问题的一种方法是将值插入naruns向量中的适当位置,以便可以正确创建naok。这看起来像是:

Run Length Encoding
    lengths: int [1:7] 2 4 2 2 2 2 2
    values : logi [1:7] TRUE FALSE TRUE FALSE FALSE TRUE FALSE 

也就是说,位置3中的4(TRUE)将被分为2(TRUE)和2(FALSE),也就是说,在which(naruns$values & naruns$lengths > maxgap)标识的位置,但我不确定a将值插入正确位置的好方法。

我已经考虑了几种笨拙的做法,但他们已经走到了死胡同。通过查看其他(无关)问题的答案,我知道许多人可以提出比我在合理的时间内可能发出的任何东西更强大和可扩展的东西。谢谢你的帮助。

1 个答案:

答案 0 :(得分:0)

事实证明,一旦我更加狭隘地发现问题,我就能回答我自己的问题。它们在我上面描述的方法中的关键问题是将值插入到指定位置的向量中。我在How to insert elements into a vector?找到了一个很好的方法。使用那里接受的答案,我创建了这个函数,填充NAs的运行达到指定的最大数量,并将其余的作为NA。

它基于zoo :: na.locf,除了我将.fill_short_gaps替换为下面显示的版本:

fillNa <- function (object, na.rm = TRUE, fromLast, rev, maxfill = Inf, 
                    rule = 2, ...) {

    ## x = series with gaps
    ## fill = same series with filled gaps
    .fill_short_gaps <- function(x, fill, maxfill) {
        if (maxfill <= 0)
            return(x)
        if (maxfill >= length(x))
            return(fill)

        naruns <- rle(is.na(x))
        len <- length(naruns$lengths)

        # Identify which runs are greater than maxfill
        fill.idx <- which(naruns$values & naruns$lengths > maxfill)

        # Create a new naruns
        naruns2 <- vector('list', 2)
        attr(naruns2, 'class') <- 'rle'

        idx <- c(seq_along(naruns$values), fill.idx + 0.5)
        naruns2$values <- c(naruns$values, rep(FALSE, length(fill.idx)))
        naruns2$values[fill.idx] <- TRUE
        naruns2$values <- naruns2$values[order(idx)]

        fill.lngth <- naruns$lengths[fill.idx]
        naruns2$lengths <- c(naruns$lengths, (fill.lngth - maxfill))
        naruns2$lengths[fill.idx] <- maxfill
        naruns2$lengths <- naruns2$lengths[order(idx)]

        naok <- rep(NA, length(x))

        naok <- inverse.rle(naruns2)
        ifelse(naok, fill, x)
    }

    L <- list(...)
    if ("x" %in% names(L) || "xout" %in% names(L)) {
        if (!missing(fromLast)) {
            stop("fromLast not supported if x or xout is specified")
        }
        return(na.approx(object, na.rm = na.rm, maxfill = maxfill, 
                         method = "constant", rule = rule, ...))
    }
    na.locf.0 <- function(x) {
        L <- !is.na(x)
        idx <- if (fromLast) 
                   rev(c(NA, rev(which(L)))[cumsum(rev(L)) + 1])
               else c(NA, which(L))[cumsum(L) + 1]
        na.index <- function(x, i) {
            L <- !is.na(i)
            x[!L] <- NA
            x[L] <- coredata(x)[i[L]]
            x
        }
        xf <- na.index(x, idx)
        .fill_short_gaps(x, xf, maxfill = maxfill)
    }

    if (!missing(rev)) {
        warning("na.locf.default: rev= deprecated. Use fromLast= instead.")
        if (missing(fromLast)) 
            fromLast <- rev
    }
    else if (missing(fromLast)) 
        fromLast <- FALSE
    rev <- base::rev
    object[] <- if (length(dim(object)) == 0) 
                    na.locf.0(object)
                else apply(object, length(dim(object)), na.locf.0)
    if (na.rm) 
        na.trim(object, is.na = "all")
    else object
}