我是R的新手,我有一个data.table dt
> library(data.table)
> dt <- data.table(A = c(1,2,3,4,74,6, 7, 8, 9, 75, 11, 12),
+ B=c("P","P","P","P", "P", "P" ,"Q","Q","Q", "Q", "Q", "Q"),
+ C=c("a","b","c","d","e","f", "g", "h", "i", "j", "k", "l"))
> dt
A B C
1: 1 P a
2: 2 P b
3: 3 P c
4: 4 P d
5: 74 P e
6: 6 P f
7: 7 Q g
8: 8 Q h
9: 9 Q i
10: 75 Q j
11: 11 Q k
12: 12 Q l
我使用以下
计算了A值的异常值,标准差> #Outlier Identification by customer_count
> dt[,out := ifelse((A > (mean(A)+2*sd(A))|A < (mean(A)-2*sd(A))),1,0) ]
> dt
A B C out
1: 1 P a 0
2: 2 P b 0
3: 3 P c 0
4: 4 P d 0
5: 74 P e 1
6: 6 P f 0
7: 7 Q g 0
8: 8 Q h 0
9: 9 Q i 0
10: 75 Q j 1
11: 11 Q k 0
12: 12 Q l 0
专栏&#34; out&#34;表示我的异常值,现在我想将A中的异常值替换为每个组的第二个最小值&#34; B&#34;。我怎么能这样做。
我的最终结果应如下:
> dt
A B C out
1: 1 P a 0
2: 2 P b 0
3: 3 P c 0
4: 4 P d 0
5: 2 P e 1
6: 6 P f 0
7: 7 Q g 0
8: 8 Q h 0
9: 9 Q i 0
10: 8 Q j 1
11: 11 Q k 0
12: 12 Q l 0
答案 0 :(得分:0)
我们可以使用replace
dt[, A := replace(A, out ==1, sort(A)[2]) , by = B]
dt
# A B C out
# 1: 1 P a 0
# 2: 2 P b 0
# 3: 3 P c 0
# 4: 4 P d 0
# 5: 2 P e 1
# 6: 6 P f 0
# 7: 7 Q g 0
# 8: 8 Q h 0
# 9: 9 Q i 0
#10: 8 Q j 1
#11: 11 Q k 0
#12: 12 Q l 0
或另一种选择是
dt[, A := pmax((out==1)*sort(A)[2], (out==0)*A), B]