我想修改R zoo库中的.fill_short_gaps函数的maxgaps参数的效果(在na.locf和na.approx中使用),如uday&comment; {{{{{{ 3}}
以下示例说明了na.locf上下文中的现有行为。
x <- c(rep(NA, 2), 1:4, rep(NA, 4), 7:8, rep(NA, 2), 9:10)
y <- na.locf(x, na.rm=FALSE, maxgap=2)
rbind(x, y)
结果是
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16]
x NA NA 1 2 3 4 NA NA NA NA 7 8 NA NA 9 10
y NA NA 1 2 3 4 NA NA NA NA 7 8 8 8 9 10
但是,我希望7:10中的四个内部NAs组合填充maxgap
值,其余为NA。 E.g:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16]
x NA NA 1 2 3 4 NA NA NA NA 7 8 NA NA 9 10
z NA NA 1 2 3 4 4 4 NA NA 7 8 8 8 9 10
作为参考,这里是动物园包中的here:
## x = series with gaps
## fill = same series with filled gaps
.fill_short_gaps <- function(x, fill, maxgap) {
if (maxgap <= 0)
return(x)
if (maxgap >= length(x))
return(fill)
naruns <- rle(is.na(x))
# This is the part I want to modify. Currently sets all runs > maxgap
# to FALSE (meaning don't fill)
naruns$values[naruns$lengths > maxgap] <- FALSE
naok <- inverse.rle(naruns)
ifelse(naok, fill, x)
}
以x
为例,naruns
如下所示:
naruns
Run Length Encoding
lengths: int [1:6] 2 4 4 2 2 2
values : logi [1:6] TRUE FALSE TRUE FALSE TRUE FALSE
解决我的问题的一种方法是将值插入naruns
向量中的适当位置,以便可以正确创建naok
。这看起来像是:
Run Length Encoding
lengths: int [1:7] 2 4 2 2 2 2 2
values : logi [1:7] TRUE FALSE TRUE FALSE FALSE TRUE FALSE
也就是说,位置3中的4(TRUE)将被分为2(TRUE)和2(FALSE),也就是说,在which(naruns$values & naruns$lengths > maxgap)
标识的位置,但我不确定a将值插入正确位置的好方法。
我已经考虑了几种笨拙的做法,但他们已经走到了死胡同。通过查看其他(无关)问题的答案,我知道许多人可以提出比我在合理的时间内可能发出的任何东西更强大和可扩展的东西。谢谢你的帮助。
答案 0 :(得分:0)
事实证明,一旦我更加狭隘地发现问题,我就能回答我自己的问题。它们在我上面描述的方法中的关键问题是将值插入到指定位置的向量中。我在How to insert elements into a vector?找到了一个很好的方法。使用那里接受的答案,我创建了这个函数,填充NAs的运行达到指定的最大数量,并将其余的作为NA。
它基于zoo :: na.locf,除了我将.fill_short_gaps替换为下面显示的版本:
fillNa <- function (object, na.rm = TRUE, fromLast, rev, maxfill = Inf,
rule = 2, ...) {
## x = series with gaps
## fill = same series with filled gaps
.fill_short_gaps <- function(x, fill, maxfill) {
if (maxfill <= 0)
return(x)
if (maxfill >= length(x))
return(fill)
naruns <- rle(is.na(x))
len <- length(naruns$lengths)
# Identify which runs are greater than maxfill
fill.idx <- which(naruns$values & naruns$lengths > maxfill)
# Create a new naruns
naruns2 <- vector('list', 2)
attr(naruns2, 'class') <- 'rle'
idx <- c(seq_along(naruns$values), fill.idx + 0.5)
naruns2$values <- c(naruns$values, rep(FALSE, length(fill.idx)))
naruns2$values[fill.idx] <- TRUE
naruns2$values <- naruns2$values[order(idx)]
fill.lngth <- naruns$lengths[fill.idx]
naruns2$lengths <- c(naruns$lengths, (fill.lngth - maxfill))
naruns2$lengths[fill.idx] <- maxfill
naruns2$lengths <- naruns2$lengths[order(idx)]
naok <- rep(NA, length(x))
naok <- inverse.rle(naruns2)
ifelse(naok, fill, x)
}
L <- list(...)
if ("x" %in% names(L) || "xout" %in% names(L)) {
if (!missing(fromLast)) {
stop("fromLast not supported if x or xout is specified")
}
return(na.approx(object, na.rm = na.rm, maxfill = maxfill,
method = "constant", rule = rule, ...))
}
na.locf.0 <- function(x) {
L <- !is.na(x)
idx <- if (fromLast)
rev(c(NA, rev(which(L)))[cumsum(rev(L)) + 1])
else c(NA, which(L))[cumsum(L) + 1]
na.index <- function(x, i) {
L <- !is.na(i)
x[!L] <- NA
x[L] <- coredata(x)[i[L]]
x
}
xf <- na.index(x, idx)
.fill_short_gaps(x, xf, maxfill = maxfill)
}
if (!missing(rev)) {
warning("na.locf.default: rev= deprecated. Use fromLast= instead.")
if (missing(fromLast))
fromLast <- rev
}
else if (missing(fromLast))
fromLast <- FALSE
rev <- base::rev
object[] <- if (length(dim(object)) == 0)
na.locf.0(object)
else apply(object, length(dim(object)), na.locf.0)
if (na.rm)
na.trim(object, is.na = "all")
else object
}