R Boxplot.stats无法识别异常值

时间:2020-10-09 04:56:32

标签: r

根据数据集的摘要,我在“ PROD_QTY”字段中找到了一个离群值,我想使用boxplot.stats查找该离群值,但是它没有用。我不知道为什么(也许问题出在mtext()吗?),请帮我弄清楚!谢谢!这是我的代码和结果。

library(dplyr)
library(data.table)
library(stats)

t_data <- read.csv(file.choose())
str(t_data)

'data.frame':   264836 obs. of  8 variables:
 $ DATE          : int  43390 43599 43605 43329 43330 43604 43601 43601 43332 43330 ...
 $ STORE_NBR     : int  1 1 1 2 2 4 4 4 5 7 ...
 $ LYLTY_CARD_NBR: int  1000 1307 1343 2373 2426 4074 4149 4196 5026 7150 ...
 $ TXN_ID        : int  1 348 383 974 1038 2982 3333 3539 4525 6900 ...
 $ PROD_NBR      : int  5 66 61 69 108 57 16 24 42 52 ...
 $ PROD_NAME     : Factor w/ 114 levels "Burger Rings 220g",..: 44 2 80 76 43 51 78 23 14 24 ...
 $ PROD_QTY      : int  2 3 2 5 3 1 1 1 1 2 ...
 $ TOT_SALES     : num  6 6.3 2.9 15 13.8 5.1 5.7 3.6 3.9 7.2 ...

summary(t_data)

    DATE         STORE_NBR     LYLTY_CARD_NBR        TXN_ID           PROD_NBR     
 Min.   :43282   Min.   :  1.0   Min.   :   1000   Min.   :      1   Min.   :  1.00  
 1st Qu.:43373   1st Qu.: 70.0   1st Qu.:  70021   1st Qu.:  67602   1st Qu.: 28.00  
 Median :43464   Median :130.0   Median : 130358   Median : 135138   Median : 56.00  
 Mean   :43464   Mean   :135.1   Mean   : 135550   Mean   : 135158   Mean   : 56.58  
 3rd Qu.:43555   3rd Qu.:203.0   3rd Qu.: 203094   3rd Qu.: 202701   3rd Qu.: 85.00  
 Max.   :43646   Max.   :272.0   Max.   :2373711   Max.   :2415841   Max.   :114.00  
                                                                                     
                                    PROD_NAME         PROD_QTY         TOT_SALES      
 Kettle Mozzarella   Basil & Pesto 175g  :  3304   Min.   :  1.000   Min.   :  1.500  
 Kettle Tortilla ChpsHny&Jlpno Chili 150g:  3296   1st Qu.:  2.000   1st Qu.:  5.400  
 Cobs Popd Swt/Chlli &Sr/Cream Chips 110g:  3269   Median :  2.000   Median :  7.400  
 Tyrrells Crisps     Ched & Chives 165g  :  3268   Mean   :  1.907   Mean   :  7.304  
 Cobs Popd Sea Salt  Chips 110g          :  3265   3rd Qu.:  2.000   3rd Qu.:  9.200  
 Kettle 135g Swt Pot Sea Salt            :  3257   Max.   :200.000   Max.   :650.000  
 (Other)                                 :245177                           

outlier_value<- boxplot.stats(t_data$PROD_QTY)$out  # outlier values.
boxplot(t_data$PROD_QTY, main="Product Quantity", boxwex=0.1)
mtext(paste("Outliers: ", paste(outlier_value, collapse=", ")), cex=0.6)

enter image description here

0 个答案:

没有答案