在SQL Server的专栏中,我必须分别删除每个组的异常值。这是我的专栏
select
customer,
sku,
stuff,
action,
acnumber,
year
from
mytable
示例数据:
customer sku year stuff action
-----------------------------------
1 1 2 2017 10 0
2 1 2 2017 20 1
3 1 3 2017 30 0
4 1 3 2017 40 1
5 2 4 2017 50 0
6 2 4 2017 60 1
7 2 5 2017 70 0
8 2 5 2017 80 1
9 1 2 2018 10 0
10 1 2 2018 20 1
11 1 3 2018 30 0
12 1 3 2018 40 1
13 2 4 2018 50 0
14 2 4 2018 60 1
15 2 5 2018 70 0
16 2 5 2018 80 1
我必须从stuff变量中删除异常值,但必须按组customer+sku+year
分别删除。
所有低于25百分位数且高于75百分位的人都应被视为异常值,并且每个群体都必须遵守这一原则。
如何清除下次工作的数据集?
注意,在此数据集中,有可变动作(它表示值0和1)。它不是组变量,但必须仅对ZERO(0)
类动作变量删除异常值。
在R语言中,这被确定为
remove_outliers <- function(x, na.rm = TRUE, ...) {
qnt <- quantile(x, probs=c(.25, .75), na.rm = na.rm, ...)
H <- 1.5 * IQR(x, na.rm = na.rm)
y <- x
y[x < (qnt[1] - H)] <- NA
y[x > (qnt[2] + H)] <- NA
y
}
new <- remove_outliers(vpg$stuff)
vpg=cbind(new,vpg)
答案 0 :(得分:1)
像这样的东西,也许:
DELETE mytable
WHERE PERCENT_RANK() OVER (PARTITION BY Department ORDER BY customer, sku, year ORDER BY stuff ) < .25 OR
PERCENT_RANK() OVER (PARTITION BY Department ORDER BY customer, sku, year ORDER BY stuff ) > .75