数据https://drive.google.com/file/d/1YuhqzBbQfdJx9MWYmc2nrlgOO-IyARoK/view?usp=sharing
如何从给定的数据中标记异常值。我想知道哪些地点是异常值。到目前为止,这是我的代码。谢谢
# without jitter
ggplot(data=df, aes(x=variable, y=value, fill=variable)) + geom_boxplot() + theme_bw() + labs(x="Environmental Parameters", y="Standardized Range")+theme(legend.position = "none") + theme(text=element_text(family="Times New Roman", face="bold", size=12))
#with
ggplot(data=df, aes(x=variable, y=value, fill=variable)) + geom_boxplot() + theme_bw() + labs(x="Environmental Parameters", y="Standardized Range")+theme(legend.position = "none") + theme(text=element_text(family="Times New Roman", face="bold", size=12)) + geom_jitter(position=position_jitter(0.1))
答案 0 :(得分:0)
正如@ jtr13在此答案[1中所建议的那样,要在箱线图中明确显示离群值,请使用ggplot_build
函数提取离群值列表,然后使用map_df
函数将此列表转换为小标题,将在geom_text
中用于突出显示异常值。
在下面,我们看到带有红色突出显示的异常值的箱线图。
# load packages
require(tidyverse)
require(reshape)
# read data
# path = '/'
file_path<- paste0(path, '/StanEnvCCA.csv')
StanEnvCCA <-
read.csv(file_path,
header = T,
sep = ';',
dec = '.')
# transform
df<- melt(StanEnvCCA)
# calculate boxplot object
g <- ggplot(data=df, aes(x=variable, y=value, fill=variable)) +
geom_boxplot() +
theme_bw() +
labs(x="Environmental Parameters", y="Standardized Range")+
theme(legend.position = "none") +
theme(text=element_text(family="Times New Roman", face="bold", size=12)) +
geom_jitter(position=position_jitter(0.1))
# get list of outliers
out <- ggplot_build(g)[["data"]][[1]][["outliers"]]
# label list elements with factor levels
names(out) <- levels(factor(df$variable))
# convert to tidy data
tidyout <- purrr::map_df(out, tibble::as_tibble, .id = "variable")
# plot boxplots with labels
g + geom_text(data = tidyout, aes(variable, value, label = variable),
hjust = -.3, colour='red')
答案 1 :(得分:-1)
将文件保存到工作场所并加载。我使用file.choose()只是为了加快速度。
filename <- file.choose()
bd<-read.xlsx(filename)
将变量名称作为标签放置到每个值
bd<-data.frame(bd[0:0], stack(bd[2:ncol(bd)]))
绘制情节
g<-ggplot(data=bd, aes(x=bd$ind, y=bd$values)) + geom_boxplot() + theme_bw()
从图中提取异常值
out <- ggplot_build(g)[["data"]][[1]][["outliers"]]
为列表添加标签
names(out) <- levels(factor(bd$ind))
整理数据
tidyout <- purrr::map_df(out, tibble::as_tibble, .id = "ind")
绘制方框图
g + geom_text(data = tidyout, aes(tidyout$ind, tidyout$value, label = tidyout$value),
hjust = -.3)
这是jtr13对本帖子Labeling Outliers of Boxplots in R的回答的改编。
希望有帮助。