我的数据框如下:
etf_id<-c("a","b","c","d","e","a","b","c","d","e","a","b","c","d","e")
factor<-c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C")
normalized<-c(-0.048436801,2.850578601,2.551666490,0.928625186,-0.638111793,
-0.540615895,-0.501691539,-1.099239823,-0.040736139,-0.192048665,
0.198915407,-0.092525810,0.214317734,2.550478998,0.024613778)
df<-data.frame(etf_id,factor,normalized)
and Im试图通过2种方法消除异常值。首先,我尝试使用outlier.color = NA,outlier.size = 0,outlier.shape = NA
:
library(ggplot2)
library(plotly)
ggplotly(df %>%
ggplot(aes(factor, normalized, color = factor)) +
geom_boxplot(outlier.color = NA,outlier.size = 0,outlier.shape = NA) +
coord_cartesian(ylim = quantile(df$normalized, c(0.01, 0.99), na.rm = T)))
带有钻石数据集的第二个示例。
p<-ggplotly(diamonds %>%
ggplot(aes(cut,price, color = cut)) +
geom_boxplot(outlier.color = NA,outlier.size = 0,outlier.shape = NA))
然后我尝试:
ggplotly(df %>%
ggplot(aes(factor, normalized, color = factor)) +
geom_boxplot(outlier.color = NA,outlier.size = 0,outlier.shape = NA) +
coord_cartesian(ylim = quantile(boxplot.stats(df$normalized)$stats[c(1, 5)]*1.5, c(0.01, 0.99), na.rm = T)))
但是这种方法似乎减少了我的图y限制,我需要一个通用的解决方案。
答案 0 :(得分:1)
我不确定您要使用第二种方法做什么。但是,就其价值而言,您面临的问题根源于此部分代码:boxplot.stats(df$normalized)$stats[c(1, 5)]*1.5
具体来说,boxplot.stats(df$normalized)$stats
返回此向量:
[1] -1.09923982 -0.34687010 -0.04073614 0.57147146 0.92862519
这些是全部数据的箱线统计信息(即下晶须,下铰链,中位数,上铰链和上晶须)。但是,由于您正在绘制的图形通过factor
变量进一步细分了数据,因此boxplot.stats
中所有数据的值都不会为您提供良好的边界。
回到原来的问题中,将异常值隐藏在框图中:ggplotly不接受传递给ggplot的outlier.shape = NA
参数。相反,您应该专门将异常值隐藏在其中。可以找到一种解决方案on plotly's GitHub issue tracker here。
答案 1 :(得分:1)