我有一个时间序列,我想用其他(而非异常值)值随机替换异常值。 时间序列如下:
date Category Value1
2018-09-10 A .4
2018-09-10 B .6
2018-09-10 A 4
2018-09-10 C .2
2018-09-10 D 7
然后我尝试确定异常值,如下所示:
qn = quantile(df1$value1, c(0.05, 0.85), na.rm = TRUE)
df6 = within(df1, { value = ifelse(df1$value1 < qn[1], qn[1], df1$value1)
value = ifelse(df1$value1 > qn[2], qn[2], df1$value1 )})
然后,我想用列value1
中的一些非离群值替换离群值。
答案 0 :(得分:3)
如果要随机替换离群值,一种方法是
#Find out indices which are outliers
inds <- df1$Value1 > qn[2] | df1$Value1 < qn[1]
#Replace those outliers by randomly selecting non-outliers
df1$Value1[inds] <- sample(df1$Value1[!inds], sum(inds))
df1
# date Category Value1
#1 2018-09-10 A 0.4
#2 2018-09-10 B 0.6
#3 2018-09-10 A 4.0
#4 2018-09-10 C 4.0
#5 2018-09-10 D 0.6
数据
df1 <- read.table(text = "date Category Value1
2018-09-10 A .4
2018-09-10 B .6
2018-09-10 A 4
2018-09-10 C .2
2018-09-10 D 7", header =T)
qn <- quantile(df1$Value1, c(0.05, 0.85), na.rm = TRUE)