根据数据集的摘要,我在“ PROD_QTY”字段中找到了一个离群值,我想使用boxplot.stats查找该离群值,但是它没有用。我不知道为什么(也许问题出在mtext()吗?),请帮我弄清楚!谢谢!这是我的代码和结果。
library(dplyr)
library(data.table)
library(stats)
t_data <- read.csv(file.choose())
str(t_data)
'data.frame': 264836 obs. of 8 variables:
$ DATE : int 43390 43599 43605 43329 43330 43604 43601 43601 43332 43330 ...
$ STORE_NBR : int 1 1 1 2 2 4 4 4 5 7 ...
$ LYLTY_CARD_NBR: int 1000 1307 1343 2373 2426 4074 4149 4196 5026 7150 ...
$ TXN_ID : int 1 348 383 974 1038 2982 3333 3539 4525 6900 ...
$ PROD_NBR : int 5 66 61 69 108 57 16 24 42 52 ...
$ PROD_NAME : Factor w/ 114 levels "Burger Rings 220g",..: 44 2 80 76 43 51 78 23 14 24 ...
$ PROD_QTY : int 2 3 2 5 3 1 1 1 1 2 ...
$ TOT_SALES : num 6 6.3 2.9 15 13.8 5.1 5.7 3.6 3.9 7.2 ...
summary(t_data)
DATE STORE_NBR LYLTY_CARD_NBR TXN_ID PROD_NBR
Min. :43282 Min. : 1.0 Min. : 1000 Min. : 1 Min. : 1.00
1st Qu.:43373 1st Qu.: 70.0 1st Qu.: 70021 1st Qu.: 67602 1st Qu.: 28.00
Median :43464 Median :130.0 Median : 130358 Median : 135138 Median : 56.00
Mean :43464 Mean :135.1 Mean : 135550 Mean : 135158 Mean : 56.58
3rd Qu.:43555 3rd Qu.:203.0 3rd Qu.: 203094 3rd Qu.: 202701 3rd Qu.: 85.00
Max. :43646 Max. :272.0 Max. :2373711 Max. :2415841 Max. :114.00
PROD_NAME PROD_QTY TOT_SALES
Kettle Mozzarella Basil & Pesto 175g : 3304 Min. : 1.000 Min. : 1.500
Kettle Tortilla ChpsHny&Jlpno Chili 150g: 3296 1st Qu.: 2.000 1st Qu.: 5.400
Cobs Popd Swt/Chlli &Sr/Cream Chips 110g: 3269 Median : 2.000 Median : 7.400
Tyrrells Crisps Ched & Chives 165g : 3268 Mean : 1.907 Mean : 7.304
Cobs Popd Sea Salt Chips 110g : 3265 3rd Qu.: 2.000 3rd Qu.: 9.200
Kettle 135g Swt Pot Sea Salt : 3257 Max. :200.000 Max. :650.000
(Other) :245177
outlier_value<- boxplot.stats(t_data$PROD_QTY)$out # outlier values.
boxplot(t_data$PROD_QTY, main="Product Quantity", boxwex=0.1)
mtext(paste("Outliers: ", paste(outlier_value, collapse=", ")), cex=0.6)