指定限制

时间:2018-03-10 01:54:43

标签: r ggplot2

我想创建一个ggplot2直方图,其中图的限制等于数据集中的最小值和最大值,而不排除实际直方图中的那些值。

我在使用基本图形时得到了我正在寻找的行为。具体来说,下面的第二个直方图显示了与第一个直方图相同的所有值(即,第二个直方图中没有排除任何二进制数),即使我在第二个图中包含xlim个参数:

min_wt <- min(mtcars$wt)
max_wt <- max(mtcars$wt)
xlim <- c(min_wt, max_wt)

hist(mtcars$wt, breaks = 30, main = "No limits added")

hist(mtcars$wt, breaks = 30, xlim = xlim, main = "Limits added")

enter image description here enter image description here

ggplot2虽然没有给我这种行为:

library(ggplot2)

# Using green colour to make dropped bins easy to see:
p <- ggplot(mtcars, aes(x = wt)) + geom_histogram(colour = "green", bins = 30)
p + ggtitle("No limits added")

p + xlim(xlim) + ggtitle("Limits added") 

enter image description here enter image description here

看看在第二个图中我是如何丢失低于2和2的高于5的点之一?我想知道如何解决这个问题。一些misc注意事项:

首先,指定boundary允许我在直方图中包含最小值(即低于2的值),但我仍然没有解决大于5的2个值的问题:

ggplot(mtcars, aes(x = wt)) + 
  geom_histogram(bins = 30, colour = "green", boundary = min_wt) + 
  xlim(xlim) +
  ggtitle("Limits added with boundary too")

enter image description here

其次,问题的存在取决于为bins选择的值。例如,当我将bins增加到50时,我没有得到任何删除值:

ggplot(mtcars, aes(x = wt)) + 
  geom_histogram(bins = 50, colour = "green", boundary = min_wt) + 
  xlim(xlim) +
  ggtitle("Limits added with boundary too, but with bins = 50")

enter image description here

最后,我认为这个问题与SO上提出的问题有关:geom_histogram: wrong bins?并在此讨论:https://github.com/tidyverse/ggplot2/issues/1651。换句话说,我认为这个问题与“舍入错误”有关。我在此问题的第二篇文章(显示图表的文章)中更深入地描述了这个错误:https://github.com/daattali/ggExtra/issues/81

这是我的会话信息:

R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] ggplot2_2.2.1

loaded via a namespace (and not attached):
 [1] labeling_0.3      colorspace_1.3-2  scales_0.5.0.9000
 [4] compiler_3.4.2    lazyeval_0.2.1    plyr_1.8.4       
 [7] tools_3.4.2       pillar_1.2.1      gtable_0.2.0     
[10] tibble_1.4.2      yaml_2.1.16       Rcpp_0.12.15     
[13] grid_3.4.2        rlang_0.2.0.9000  munsell_0.4.3 

1 个答案:

答案 0 :(得分:1)

@ eipi10在评论中提到的另一个选项是更改oob中的scale_x_continuous(越界)参数。

  

处理超出限制范围(超出范围)的限制的函数。默认值用NA替换超出范围的值。

默认使用scales::censor(),您可以将其更改为oob = scales::squish,将值移至范围内。

比较以下两个图。

p + scale_x_continuous(limits = xlim) + ggtitle("default: scales::censor")
  

警告:   删除了包含缺失值的1行(geom_bar)。

enter image description here

p + scale_x_continuous(limits = xlim, oob = scales::squish) + ggtitle("using scales::squish")

enter image description here

您指定了边界的第三个ggplot,但仍然有两个大于5的值被删除了。

ggplot(mtcars, aes(x = wt)) + 
 geom_histogram(bins = 30, colour = "green", boundary = min_wt) + 
 scale_x_continuous(limits = xlim, oob = scales::squish) +
 ggtitle("Limits added with boundary too") +
 labs(subtitle = "scales::squish")

enter image description here

希望这有帮助。