我想通过多列按组识别异常值,并用95%和5%的值处理异常值。我创建了一个函数来处理以下异常值。
fun_name <- function(data,x){
qnt <- quantile(data$x, probs=c(.25, .75), na.rm = T)
caps <- quantile(data$x, probs=c(.05, .95), na.rm = T)
H <- 1.5 * IQR(data$x, na.rm = T)
data[which(data$x < (qnt[1] - H)),"x"] <- caps[1]
data[which(data$x > (qnt[2] + H)),"x"] <- caps[2]
return(data)
}
我曾尝试像下面那样用分组依据估算离群值:
total_data <- data%>%
group_by(col1,col2,col3,col4)%>%
mutate(fun_name(data,col5)) ## col5 is of numric type.
我遇到错误:
Column `fun_name(data,col5)` is of unsupported class data.frame
哪里出问题了?建议我。
答案 0 :(得分:0)
您应该将功能更改为:
fun_name <- function(x){
qnt <- quantile(x, probs=c(.25, .75), na.rm = TRUE)
caps <- quantile(x, probs=c(.05, .95), na.rm = TRUE)
H <- 1.5 * IQR(x, na.rm = TRUE)
x[which(x < (qnt[1] - H))] <- caps[1]
x[which(x > (qnt[2] + H))] <- caps[2]
return(x)
}
,然后将其用作:
library(dplyr)
data%>% group_by(col1,col2,col3,col4)%>% mutate(col = fun_name(col5))