我注意到来自geom_histogram
的{{1}}的一些奇怪行为。它似乎省略了一个酒吧,我无法弄清楚原因。这是一个例子:
ggplot2
奇怪的是它难以复制。如果我手动制作变量"",显示正确的条形图,但我怀疑它与有效数字有关:
> # show the data
> head(df)
other_variable variable
1 0 3.663562
2 0 3.663562
3 0 3.663562
4 0 3.663562
5 0 -3.663562
6 1 -3.663562
>
> # select 25 random rows
> set.seed(1)
> var1 <- df[runif(25,0,nrow(df)),]$variable
>
> # display the data
> var1
[1] -3.6635616 3.6635616 3.6635616 3.6635616 -3.6635616 -0.8001193
[7] 3.6635616 3.6635616 3.6635616 3.6635616 -3.6635616 3.6635616
[13] 3.6635616 3.6635616 3.6635616 3.6635616 3.6635616 3.6635616
[19] 3.6635616 3.6635616 3.6635616 3.6635616 3.6635616 -1.2950457
[25] -3.6635616
>
> # histogram of var1 doesn't include values = 3.6635616
> ggplot(data=NULL, aes(x=var1)) + geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
它似乎也与箱子的数量有关。如果我修补它们,我可以让它出现:
> # make a new vector with the same data
> var2 <- c(
+ -3.6635616, 3.6635616, 3.6635616, 3.6635616, -3.6635616, -0.8001193,
+ 3.6635616, 3.6635616, 3.6635616, 3.6635616, -3.6635616, 3.6635616,
+ 3.6635616, 3.6635616, 3.6635616, 3.6635616, 3.6635616, 3.6635616,
+ 3.6635616, 3.6635616, 3.6635616, 3.6635616, 3.6635616, -1.2950457,
+ -3.6635616
+ )
>
> # confirm that they're equal
> all.equal(var1, var2)
[1] TRUE
>
> # something suspicious
> var1[1]==var2[1]
[1] FALSE
>
> # histogram of var2 does include values = 3.6635616
> ggplot(data=NULL, aes(x=var2)) + geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
发生了什么?
修改
添加更多信息以尝试使其可重现。
> # if I mess with the bin number I can get it to show up
> ggplot(data=NULL, aes(x=var1)) + geom_histogram(bins=40) # no
> ggplot(data=NULL, aes(x=var1)) + geom_histogram(bins=41) # yes
有趣的是,即使> dput(var1)
c(-3.66356164612965, 3.66356164612965, 3.66356164612965, 3.66356164612965,
-3.66356164612965, -0.800119300112113, 3.66356164612965, 3.66356164612965,
3.66356164612965, 3.66356164612965, -3.66356164612965, 3.66356164612965,
3.66356164612965, 3.66356164612965, 3.66356164612965, 3.66356164612965,
3.66356164612965, 3.66356164612965, 3.66356164612965, 3.66356164612965,
3.66356164612965, 3.66356164612965, 3.66356164612965, -1.29504568965475,
-3.66356164612965)
> sprintf("%a",var1)
[1] "-0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1"
[4] "0x1.d4ef968880dd4p+1" "-0x1.d4ef968880dd4p+1" "-0x1.99a93ca5c286dp-1"
[7] "0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1"
[10] "0x1.d4ef968880dd4p+1" "-0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1"
[13] "0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1"
[16] "0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1"
[19] "0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1"
[22] "0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1" "-0x1.4b881d43e494fp+0"
[25] "-0x1.d4ef968880dd4p+1"
也没有重现这个问题:
dput