hist中的include.lowest参数有什么意义?

时间:2018-04-26 12:06:25

标签: r histogram

在用于绘制直方图的hist函数中,有一个参数include.lowest,其默认值为TRUE。

根据我的理解,当断点被设置为向量时,该参数应该允许保持或不保持最低断点的最低界限。

但是,如果我作为一个纯粹的人为例子尝试像下面这样的命令:

 hist(c(1:100), breaks=c(1,2,10,50,100), include.lowest=FALSE)

我刚收到错误:

Error in hist.default(c(1:100), breaks = c(1, 2, 10, 50, 100), include.lowest = FALSE) : 
  some 'x' not counted; maybe 'breaks' do not span range of 'x'

这里发生的是,hist不允许绘制不考虑完整数据(x)的图。如果include.lowest为false,则值为" 1"来自我的数据不会出现在直方图中的任何位置。但既然如此,那么include.lowest用于什么?我无法看到任何将其设置为false的情况会产生任何差异而不会触发错误。

注意:在我的解释中,我假设我保留默认right=TRUE,但如果right=FALSE,我应该是最高中断而不是最低中的相同行为,对吧?所以我认为它不会改变任何东西。

更多上下文:我们正在开发一个图形界面,用于使用R绘制图形(它将成为R ++的一部分,当然它会变得非常棒)。当我们为所有直方图参数提供工具时,我们就陷入了困境。如果它对任何东西都没用,并且只是一些旧的hist版本的遗产,我们也可能不包括它,但如果它真的有用,我们就不想忘记它。

感谢大家的关注。

1 个答案:

答案 0 :(得分:0)

我不确定你在问什么。我假设您询问include.lowest = FALSEhist的行为,以及为什么它会在您的示例中产生错误。

这与数据分箱的方式有关。我们来看看cut,因为此函数与hist的作用密切相关。

cut(1:100, breaks = c(1, 2, 10, 50, 100))
#  [1] <NA>     (1,2]    (2,10]   (2,10]   (2,10]   (2,10]   (2,10]   (2,10]
#  [9] (2,10]   (2,10]   (10,50]  (10,50]  (10,50]  (10,50]  (10,50]  (10,50]
# [17] (10,50]  (10,50]  (10,50]  (10,50]  (10,50]  (10,50]  (10,50]  (10,50]
# [25] (10,50]  (10,50]  (10,50]  (10,50]  (10,50]  (10,50]  (10,50]  (10,50]
# [33] (10,50]  (10,50]  (10,50]  (10,50]  (10,50]  (10,50]  (10,50]  (10,50]
# [41] (10,50]  (10,50]  (10,50]  (10,50]  (10,50]  (10,50]  (10,50]  (10,50]
# [49] (10,50]  (10,50]  (50,100] (50,100] (50,100] (50,100] (50,100] (50,100]
# [57] (50,100] (50,100] (50,100] (50,100] (50,100] (50,100] (50,100] (50,100]
# [65] (50,100] (50,100] (50,100] (50,100] (50,100] (50,100] (50,100] (50,100]
# [73] (50,100] (50,100] (50,100] (50,100] (50,100] (50,100] (50,100] (50,100]
# [81] (50,100] (50,100] (50,100] (50,100] (50,100] (50,100] (50,100] (50,100]
# [89] (50,100] (50,100] (50,100] (50,100] (50,100] (50,100] (50,100] (50,100]
# [97] (50,100] (50,100] (50,100] (50,100]
#Levels: (1,2] (2,10] (10,50] (50,100]

注意1如何在“NA”中“放置”。那是因为箱子是开放 - 封闭的间隔,例如(1, 2]表示1已被排除,而2 已包含

回到hist,以下内容在使用include.lowest = FALSE

时不会出错
hist(1:100, breaks = c(0, 2, 10, 50, 100), include.lowest = FALSE)

enter image description here

澄清(基于@ MikkoMarttila的评论):在hist中使用include.lowest = FALSE进行分区是您在R中使用标准分箱的默认行为,例如cut。因此,包含设置include.lowest = FALSE的选项与cut及其默认的开闭时间间隔保持一致。大多数情况下,在绘制直方图时,您需要一个间隔,其中最小值是间隔的一部分(使用开闭时间间隔时不是这种情况),因此默认为include.lowest = TRUE