我有一个数据集,我想创建一个附加列,并希望标记离群值(大于IQR的1.5倍)。我目前正在使用此代码:
#Add additional column for flagging outliers that are beyond 1.5*interquartile range
plotdata$OUTLIERFLAG <- 0
#Cycle through variables
for (i in 1: length(unique(plotdata$variable))){
pms <- unique(plotdata$variable)[i]
dats <- subset(plotdata, plotdata$variable ==pms)
#Cycle through Sampling locations
for (bore in unique(plotdata$Sample.Point)){
subdats <- dats[dats$Sample.Point==bore,]
x1 <- match(boxplot.stats(subdats$value2)$out, subdats$value2)
ifelse(x1==0, NULL, plotdata[rownames(subdats[x1,]),]$OUTLIERFLAG <- 1)
}
}
但是,有时代码无法正常工作。对于相同的值,我将其中一个标记为离群值,而将另一个标记为离群值。 请帮助
答案 0 :(得分:1)
由于您未提供任何数据,因此我将使用mtcars数据集。您可能想将离群值定义为Q3 + IQR * 1.5以上的数据点。同样,对于基本的R操作,通常避免for循环。
df <- mtcars[, c(2, 4)]
df$outliers <- ifelse(test = df$hp > quantile(df$hp, probs = 0.75) + IQR(df$hp) * 1.5, yes = "FLAG", no = NA)
df
> df
cyl hp outliers
Mazda RX4 6 110 <NA>
Mazda RX4 Wag 6 110 <NA>
Datsun 710 4 93 <NA>
Hornet 4 Drive 6 110 <NA>
Hornet Sportabout 8 175 <NA>
Valiant 6 105 <NA>
Duster 360 8 245 <NA>
Merc 240D 4 62 <NA>
Merc 230 4 95 <NA>
Merc 280 6 123 <NA>
Merc 280C 6 123 <NA>
Merc 450SE 8 180 <NA>
Merc 450SL 8 180 <NA>
Merc 450SLC 8 180 <NA>
Cadillac Fleetwood 8 205 <NA>
Lincoln Continental 8 215 <NA>
Chrysler Imperial 8 230 <NA>
Fiat 128 4 66 <NA>
Honda Civic 4 52 <NA>
Toyota Corolla 4 65 <NA>
Toyota Corona 4 97 <NA>
Dodge Challenger 8 150 <NA>
AMC Javelin 8 150 <NA>
Camaro Z28 8 245 <NA>
Pontiac Firebird 8 175 <NA>
Fiat X1-9 4 66 <NA>
Porsche 914-2 4 91 <NA>
Lotus Europa 4 113 <NA>
Ford Pantera L 8 264 <NA>
Ferrari Dino 6 175 <NA>
Maserati Bora 8 335 FLAG
Volvo 142E 4 109 <NA>
仅有8缸,335马力的玛莎拉蒂宝来(Maserati Bora)。表示异常数据点的箱须图:
boxplot(df$hp, horizontal = TRUE)
# Vertical line indicating the outlier limit
abline(v = quantile(df$hp, probs = 0.75) + IQR(df$hp) * 1.5, col = "red") # 305.25