R hist vs geom_hist断点

时间:2015-08-19 18:07:14

标签: r graph ggplot2

我在R中使用geom_hist和histogram具有相同的断点,但我得到了不同的图。我做了一个快速搜索,有没有人知道定义中断的是什么以及为什么它们会有所不同

这产生了两个不同的情节。

set.seed(25)
data <- data.frame(Mos=rnorm(500, mean = 25, sd = 8))
data$Mos<-round(data$Mos)    

pAge <- ggplot(data, aes(x=Mos))
pAge + geom_histogram(breaks=seq(0, 50, by = 2))

enter image description here

hist(data$Mos,breaks=seq(0, 50, by = 2))

enter image description here  感谢

1 个答案:

答案 0 :(得分:2)

要在ggplot2中获得相同的直方图,请在breaksscale_x_continuousbinwidth内指定geom_histogram

此外,hist中的ggplot2和直方图使用不同的默认值来创建间隔:

  

hist:右关闭(左开)间隔。默认值:right = TRUE

  stat_bin(ggplot2):左 - 右(右开)间隔。默认值:right = FALSE

        **hist**    **ggplot2**
         freq1 Freq   freq2 Freq
    1    (0,2]    0   [0,2)    0
    2    (2,4]    2   [2,4)    2
    3    (4,6]    2   [4,6)    1
    4    (6,8]    1   [6,8)    2
    5   (8,10]    6  [8,10)    2
    6  (10,12]    9 [10,12)    7
    7  (12,14]   24 [12,14)   17
    8  (14,16]   27 [14,16)   26
    9  (16,18]   39 [16,18)   31
    10 (18,20]   48 [18,20)   46
    11 (20,22]   52 [20,22)   43
    12 (22,24]   38 [22,24)   57
    13 (24,26]   44 [24,26)   36
    14 (26,28]   46 [26,28)   52
    15 (28,30]   39 [28,30)   39
    16 (30,32]   31 [30,32)   33
    17 (32,34]   30 [32,34)   26
    18 (34,36]   24 [34,36)   29
    19 (36,38]   18 [36,38)   27
    20 (38,40]    9 [38,40)   12
    21 (40,42]    5 [40,42)    6
    22 (42,44]    4 [42,44)    0
    23 (44,46]    1 [44,46)    5
    24 (46,48]    1 [46,48)    0
    25 (48,50]    0 [48,50)    1

我包含了参数right = FALSE,因此直方图区间是左侧闭合(右侧开放),因为它们位于ggplot2中。我在两个图中添加了标签,因此更容易检查间隔是否相同。

ggplot(data, aes(x = Mos))+
  geom_histogram(binwidth = 2, colour = "black", fill = "white")+
  scale_x_continuous(breaks = seq(0, 50, by = 2))+
  stat_bin(binwidth = 2, aes(label=..count..), vjust=-0.5, geom = "text")

enter image description here

hist(data$Mos,breaks=seq(0, 50, by = 2), labels =TRUE, right =FALSE)

enter image description here

检查每个箱子的频率:

freq <- cut(data$Mos, breaks = seq(0, 50, by = 2), dig.lab = 4, right = FALSE) 
as.data.frame(table(frecuencias))