用R替换第二个最小值组的异常值

时间:2017-06-08 10:27:13

标签: r data.table outliers

我是R的新手,我有一个data.table dt

> library(data.table)
> dt <- data.table(A = c(1,2,3,4,74,6, 7, 8, 9, 75, 11, 12), 
+                  B=c("P","P","P","P", "P", "P" ,"Q","Q","Q", "Q", "Q", "Q"), 
+                  C=c("a","b","c","d","e","f", "g", "h", "i", "j", "k", "l"))
> dt
     A B C
 1:  1 P a
 2:  2 P b
 3:  3 P c
 4:  4 P d
 5: 74 P e
 6:  6 P f
 7:  7 Q g
 8:  8 Q h
 9:  9 Q i
 10: 75 Q j
 11: 11 Q k
 12: 12 Q l

我使用以下

计算了A值的异常值,标准差
> #Outlier Identification by customer_count
> dt[,out := ifelse((A > (mean(A)+2*sd(A))|A < (mean(A)-2*sd(A))),1,0) ]
> dt
     A B C out
 1:  1 P a   0
 2:  2 P b   0
 3:  3 P c   0
 4:  4 P d   0
 5: 74 P e   1
 6:  6 P f   0
 7:  7 Q g   0
 8:  8 Q h   0
 9:  9 Q i   0
10: 75 Q j   1
11: 11 Q k   0
12: 12 Q l   0

专栏&#34; out&#34;表示我的异常值,现在我想将A中的异常值替换为每个组的第二个最小值&#34; B&#34;。我怎么能这样做。

我的最终结果应如下:

> dt
     A B C out
 1:  1 P a   0
 2:  2 P b   0
 3:  3 P c   0
 4:  4 P d   0
 5:  2 P e   1
 6:  6 P f   0
 7:  7 Q g   0
 8:  8 Q h   0
 9:  9 Q i   0
10:  8 Q j   1
11: 11 Q k   0
12: 12 Q l   0 

1 个答案:

答案 0 :(得分:0)

我们可以使用replace

dt[, A := replace(A, out ==1, sort(A)[2]) , by = B]
dt
#     A B C out
# 1:  1 P a   0
# 2:  2 P b   0
# 3:  3 P c   0
# 4:  4 P d   0
# 5:  2 P e   1
# 6:  6 P f   0
# 7:  7 Q g   0
# 8:  8 Q h   0
# 9:  9 Q i   0
#10:  8 Q j   1
#11: 11 Q k   0
#12: 12 Q l   0

或另一种选择是

dt[, A := pmax((out==1)*sort(A)[2], (out==0)*A), B]