如何摆脱R中时间序列中的多个异常值?

时间:2016-02-27 18:14:57

标签: r

我正在使用"异常值"包以便删除一些不良值。但似乎rm.outliers()函数不会同时替换所有异常值。可能rm.outliers()无法递归执行despike。然后,基本上我必须多次调用此函数才能替换所有异常值。 以下是我遇到的问题的可重复示例:

require(outliers)
   # creating a timeseries:
   set.seed(12345)
   y = rnorm(10000)
   # inserting some outliers:
   y[4000:4500] = -11
   y[4501:5000] = -10
   y[5001:5100] = -9
   y[5101:5200] = -8
   y[5201:5300] = -7
   y[5301:5400] = -6
   y[5401:5500] = -5
# plotting the timeseries + outliers:
plot(y, type="l", col="black", lwd=6, xlab="Time", ylab="w'")
# trying to get rid of some outliers by replacing them by the series mean value:
new.y = outliers::rm.outlier(y, fill=TRUE, median=FALSE)
new.y = outliers::rm.outlier(new.y, fill=TRUE, median=FALSE)
# plotting the new timeseries "after removing the outliers":
lines(new.y, col="red")
# inserting a legend:
legend("bottomleft", c("raw", "new series"), col=c("black","red"), lty=c(1,1), horiz=FALSE, bty="n")

有没有人知道如何改进上面的代码,以便所有异常值都可以用平均值代替?

1 个答案:

答案 0 :(得分:1)

我认为最好的想法是使用for循环,在找到异常值时跟踪异常值。

plot(y, type="l", col="black", lwd=6, xlab="Time", ylab="w'")

maxIter <- 100
outlierQ <- rep(F, length(y))

for (i in 1:maxIter) {
  bad <- outlier(y, logical = T)
  if (!any(bad)) break
  outlierQ[bad] <- T
  y[bad] <- mean(y[!bad])
}

y[outlierQ] <- mean(y[!outlierQ])

lines(y, col="blue")