从ggplotly()箱图中删除异常值

时间:2019-12-18 02:20:48

标签: r ggplot2 ggplotly

我的数据框如下:

etf_id<-c("a","b","c","d","e","a","b","c","d","e","a","b","c","d","e")
factor<-c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C")
normalized<-c(-0.048436801,2.850578601,2.551666490,0.928625186,-0.638111793,
              -0.540615895,-0.501691539,-1.099239823,-0.040736139,-0.192048665,
              0.198915407,-0.092525810,0.214317734,2.550478998,0.024613778)
df<-data.frame(etf_id,factor,normalized)

and Im试图通过2种方法消除异常值。首先,我尝试使用outlier.color = NA,outlier.size = 0,outlier.shape = NA

library(ggplot2)
library(plotly)
ggplotly(df %>% 
  ggplot(aes(factor, normalized, color = factor)) +
  geom_boxplot(outlier.color = NA,outlier.size = 0,outlier.shape = NA) +
  coord_cartesian(ylim = quantile(df$normalized, c(0.01, 0.99), na.rm = T)))

带有钻石数据集的第二个示例。

p<-ggplotly(diamonds %>% 
  ggplot(aes(cut,price, color = cut)) +
  geom_boxplot(outlier.color = NA,outlier.size = 0,outlier.shape = NA))

然后我尝试:

ggplotly(df %>% 
  ggplot(aes(factor, normalized, color = factor)) +
  geom_boxplot(outlier.color = NA,outlier.size = 0,outlier.shape = NA) +
  coord_cartesian(ylim = quantile(boxplot.stats(df$normalized)$stats[c(1, 5)]*1.5, c(0.01, 0.99), na.rm = T)))

但是这种方法似乎减少了我的图y限制,我需要一个通用的解决方案。

2 个答案:

答案 0 :(得分:1)

我不确定您要使用第二种方法做什么。但是,就其价值而言,您面临的问题根源于此部分代码:boxplot.stats(df$normalized)$stats[c(1, 5)]*1.5

具体来说,boxplot.stats(df$normalized)$stats返回此向量:

[1] -1.09923982 -0.34687010 -0.04073614  0.57147146  0.92862519

这些是全部数据的箱线统计信息(即下晶须,下铰链,中位数,上铰链和上晶须)。但是,由于您正在绘制的图形通过factor变量进一步细分了数据,因此boxplot.stats中所有数据的值都不会为您提供良好的边界。

回到原来的问题中,将异常值隐藏在框图中:ggplotly不接受传递给ggplot的outlier.shape = NA参数。相反,您应该专门将异常值隐藏在其中。可以找到一种解决方案on plotly's GitHub issue tracker here

答案 1 :(得分:1)

我们可以深入ggplotly对象的内部,使异常值不可见。但是请注意,将鼠标悬停在不可见的异常值上仍会显示异常值测量值的悬浮信息。

p<-ggplotly(diamonds %>% 
            ggplot(aes(cut,price, color = cut)) +
            geom_boxplot(outlier.color = NA,outlier.size = 0,outlier.shape = 
NA))

for(i in 1:length(p)){
p$x$data[[i]]$marker$opacity = 0 
}

p

ggplotly boxplot no outlier