ggplot中的标签离群值

时间:2019-11-19 13:27:16

标签: r ggplot2

数据https://drive.google.com/file/d/1YuhqzBbQfdJx9MWYmc2nrlgOO-IyARoK/view?usp=sharing

如何从给定的数据中标记异常值。我想知道哪些地点是异常值。到目前为止,这是我的代码。谢谢

# without jitter
ggplot(data=df, aes(x=variable, y=value, fill=variable)) + geom_boxplot() + theme_bw() + labs(x="Environmental Parameters", y="Standardized Range")+theme(legend.position = "none") +  theme(text=element_text(family="Times New Roman", face="bold", size=12))
#with
ggplot(data=df, aes(x=variable, y=value, fill=variable)) + geom_boxplot() + theme_bw() + labs(x="Environmental Parameters", y="Standardized Range")+theme(legend.position = "none") +  theme(text=element_text(family="Times New Roman", face="bold", size=12)) + geom_jitter(position=position_jitter(0.1))

2 个答案:

答案 0 :(得分:0)

正如@ jtr13在此答案[1中所建议的那样,要在箱线图中明确显示离群值,请使用ggplot_build函数提取离群值列表,然后使用map_df函数将此列表转换为小标题,将在geom_text中用于突出显示异常值。
在下面,我们看到带有红色突出显示的异常值的箱线图。

enter image description here


# load packages
require(tidyverse)
require(reshape)

# read data

# path = '/'
file_path<- paste0(path, '/StanEnvCCA.csv')

StanEnvCCA <- 
  read.csv(file_path, 
           header = T,
           sep = ';',
           dec = '.') 

# transform
df<- melt(StanEnvCCA) 


# calculate boxplot object
g <- ggplot(data=df, aes(x=variable, y=value, fill=variable)) + 
  geom_boxplot() + 
  theme_bw() + 
  labs(x="Environmental Parameters", y="Standardized Range")+
  theme(legend.position = "none") +  
  theme(text=element_text(family="Times New Roman", face="bold", size=12)) + 
  geom_jitter(position=position_jitter(0.1))

# get list of outliers 
out <- ggplot_build(g)[["data"]][[1]][["outliers"]]

# label list elements with factor levels
names(out) <- levels(factor(df$variable))

# convert to tidy data
tidyout <- purrr::map_df(out, tibble::as_tibble, .id = "variable")

# plot boxplots with labels
g + geom_text(data = tidyout, aes(variable, value, label = variable), 
              hjust = -.3, colour='red')

答案 1 :(得分:-1)

将文件保存到工作场所并加载。我使用file.choose()只是为了加快速度。

filename <- file.choose()
bd<-read.xlsx(filename)

将变量名称作为标签放置到每个值

bd<-data.frame(bd[0:0], stack(bd[2:ncol(bd)]))

绘制情节

g<-ggplot(data=bd, aes(x=bd$ind, y=bd$values)) + geom_boxplot() + theme_bw()

从图中提取异常值

out <- ggplot_build(g)[["data"]][[1]][["outliers"]]

为列表添加标签

names(out) <- levels(factor(bd$ind))

整理数据

tidyout <- purrr::map_df(out, tibble::as_tibble, .id = "ind")

绘制方框图

g + geom_text(data = tidyout, aes(tidyout$ind, tidyout$value, label = tidyout$value), 
              hjust = -.3)

这是jtr13对本帖子Labeling Outliers of Boxplots in R的回答的改编。

希望有帮助。