我正在使用我借用 https://rpubs.com/hauselin/outliersDetect 的代码来计算中位数绝对偏差以检测异常值并用NA替换。
见下文:
outliersMAD <- function(data, MADCutOff = 2.5, replace = NA, values = FALSE,
bConstant = 1.4826, digits = 2) {
#compute number of absolute MADs away for each value
#formula: abs( ( x - median(x) ) )/ mad(x)
absMADAway <- abs((data - median(data, na.rm = TRUE))/mad(data, constant =
bConstant, na.rm = TRUE))
#subset data that has absMADAway greater than the MADCutOff and replace them
with replace
#can also replace values other than replace
data[absMADAway > MADCutOff] <- replace
if (values == TRUE) {
return(round(absMADAway, digits)) #if values == TRUE, return number of
mads for each value
} else {
return(round(data, digits)) #otherwise, return values with outliers
replaced
}
}
我想将此应用于我的数据的第4 - 52列,如下所示:
data[,4:52] <- outliersMAD(data[4:52])
但是我收到错误'需要数字数据'
但是,如果我使用以下内容,则此代码可以正常工作:
outliersMAD(data$columnofinterest)
不确定如何向前推进,R的新手会感激任何帮助!