识别离群值,并仅在R中使用groupby多列来处理离群值?

时间:2020-04-19 11:31:23

标签: r group-by outliers

我想通过多列按组识别异常值,并用95%和5%的值处理异常值。我创建了一个函数来处理以下异常值。

fun_name <- function(data,x){
  qnt <- quantile(data$x, probs=c(.25, .75), na.rm = T)
  caps <- quantile(data$x, probs=c(.05, .95), na.rm = T)
  H <- 1.5 * IQR(data$x, na.rm = T)
  data[which(data$x < (qnt[1] - H)),"x"] <- caps[1]
  data[which(data$x > (qnt[2] + H)),"x"] <- caps[2]
  return(data)
}

我曾尝试像下面那样用分组依据估算离群值:

total_data <- data%>%
  group_by(col1,col2,col3,col4)%>%
  mutate(fun_name(data,col5)) ## col5 is of numric type.

我遇到错误:

Column `fun_name(data,col5)` is of unsupported class data.frame

哪里出问题了?建议我。

1 个答案:

答案 0 :(得分:0)

您应该将功能更改为:

fun_name <- function(x){
   qnt <- quantile(x, probs=c(.25, .75), na.rm = TRUE)
   caps <- quantile(x, probs=c(.05, .95), na.rm = TRUE)
   H <- 1.5 * IQR(x, na.rm = TRUE)
   x[which(x < (qnt[1] - H))] <- caps[1]
   x[which(x > (qnt[2] + H))] <- caps[2]
   return(x)
}

,然后将其用作:

library(dplyr)
data%>% group_by(col1,col2,col3,col4)%>% mutate(col = fun_name(col5))