我想创建一个ggplot2直方图,其中图的限制等于数据集中的最小值和最大值,而不排除实际直方图中的那些值。
我在使用基本图形时得到了我正在寻找的行为。具体来说,下面的第二个直方图显示了与第一个直方图相同的所有值(即,第二个直方图中没有排除任何二进制数),即使我在第二个图中包含xlim
个参数:
min_wt <- min(mtcars$wt)
max_wt <- max(mtcars$wt)
xlim <- c(min_wt, max_wt)
hist(mtcars$wt, breaks = 30, main = "No limits added")
hist(mtcars$wt, breaks = 30, xlim = xlim, main = "Limits added")
ggplot2虽然没有给我这种行为:
library(ggplot2)
# Using green colour to make dropped bins easy to see:
p <- ggplot(mtcars, aes(x = wt)) + geom_histogram(colour = "green", bins = 30)
p + ggtitle("No limits added")
p + xlim(xlim) + ggtitle("Limits added")
看看在第二个图中我是如何丢失低于2和2的高于5的点之一?我想知道如何解决这个问题。一些misc注意事项:
首先,指定boundary
允许我在直方图中包含最小值(即低于2的值),但我仍然没有解决大于5的2个值的问题:
ggplot(mtcars, aes(x = wt)) +
geom_histogram(bins = 30, colour = "green", boundary = min_wt) +
xlim(xlim) +
ggtitle("Limits added with boundary too")
其次,问题的存在取决于为bins
选择的值。例如,当我将bins
增加到50时,我没有得到任何删除值:
ggplot(mtcars, aes(x = wt)) +
geom_histogram(bins = 50, colour = "green", boundary = min_wt) +
xlim(xlim) +
ggtitle("Limits added with boundary too, but with bins = 50")
最后,我认为这个问题与SO上提出的问题有关:geom_histogram: wrong bins?并在此讨论:https://github.com/tidyverse/ggplot2/issues/1651。换句话说,我认为这个问题与“舍入错误”有关。我在此问题的第二篇文章(显示图表的文章)中更深入地描述了这个错误:https://github.com/daattali/ggExtra/issues/81。
这是我的会话信息:
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.2
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] ggplot2_2.2.1
loaded via a namespace (and not attached):
[1] labeling_0.3 colorspace_1.3-2 scales_0.5.0.9000
[4] compiler_3.4.2 lazyeval_0.2.1 plyr_1.8.4
[7] tools_3.4.2 pillar_1.2.1 gtable_0.2.0
[10] tibble_1.4.2 yaml_2.1.16 Rcpp_0.12.15
[13] grid_3.4.2 rlang_0.2.0.9000 munsell_0.4.3
答案 0 :(得分:1)
@ eipi10在评论中提到的另一个选项是更改oob
中的scale_x_continuous
(越界)参数。
处理超出限制范围(超出范围)的限制的函数。默认值用NA替换超出范围的值。
默认使用scales::censor()
,您可以将其更改为oob = scales::squish
,将值移至范围内。
比较以下两个图。
p + scale_x_continuous(limits = xlim) + ggtitle("default: scales::censor")
警告: 删除了包含缺失值的1行(geom_bar)。
p + scale_x_continuous(limits = xlim, oob = scales::squish) + ggtitle("using scales::squish")
您指定了边界的第三个ggplot
,但仍然有两个大于5的值被删除了。
ggplot(mtcars, aes(x = wt)) +
geom_histogram(bins = 30, colour = "green", boundary = min_wt) +
scale_x_continuous(limits = xlim, oob = scales::squish) +
ggtitle("Limits added with boundary too") +
labs(subtitle = "scales::squish")
希望这有帮助。