geom_jitter的height / width参数与对数刻度交互

时间:2019-05-06 22:27:03

标签: r ggplot2

在探索一些数据时对此感到困惑,这感觉就像是意外的行为,所以以为我会发布。

geom_jitter接受高度/宽度参数作为抖动的宽度,默认值为40%。当您添加对数刻度时,这40%似乎适用于原始值。但是,如果要调整此参数,则将在对数转换后应用该值。

可以这样说明:

library(ggplot2)
library(patchwork)

set.seed(1)
dat <- data.frame(x=round(rlnorm(2000)), y=round(rlnorm(2000)))


# THESE TWO PLOTS ARE THE SAME
# with jitter
p1 <- ggplot(dat, aes(x,y)) + geom_jitter(alpha=.1) +
  labs(title='regular scale, jitter with default height/width',
       subtitle = '')
# with jitter, and explicit (but same as default) jitter size
p2 <- ggplot(dat, aes(x,y)) + geom_jitter(alpha=.1, height=.4, width=.4) +
  labs(title='regular scale, jitter with 40% height/width',
       subtitle = '<== same as that')


# THESE TWO PLOTS ARE NOT THE SAME
# with jitter and log/log scale
p3 <- ggplot(dat, aes(x,y)) + geom_jitter(alpha=.1) +
  scale_x_log10() + scale_y_log10() +
  labs(title='log scale, jitter with default height/width',
       subtitle = '')

# with jitter and log/log scale, and explicit (but same as default) jitter size
p4 <- ggplot(dat, aes(x,y)) + geom_jitter(alpha=.1, height=.4, width=.4) +
  scale_x_log10() + scale_y_log10()  +
  labs(title='log scale, jitter with 40% height/width',
       subtitle = '<== NOT the same as that')

(p1 + p2) / (p3 + p4)

enter image description here

这是预期的行为吗?

如果我要调整基础值而不是对数转换后的值的抖动宽度怎么办?

1 个答案:

答案 0 :(得分:6)

这很不错!我猜这是一个文档问题-还不够清楚。抖动不是40%,它是数据分辨率的 40%

ggplot2:::PositionJitter$setup_params中,您可以看到应用了ggplot2:::resolution函数,其结果乘以0.4

list(width = self$width %||% (resolution(data$x, zero = FALSE) * 
    0.4), height = self$height %||% (resolution(data$y, zero = FALSE) * 
    0.4), seed = self$seed)

因此,您需要做的是在将值传递给ggplot2:::resolution / width之前应用height

geom_jitter(
  width = ggplot2:::resolution(log10(dat$x), FALSE) * 0.4,
  height = ggplot2:::resolution(log10(dat$y), FALSE) * 0.4,
)

enter image description here

所有代码:

ggplot(dat, aes(x, y)) + 
  geom_jitter(
    width = ggplot2:::resolution(log10(dat$x), FALSE) * 0.4,
    height = ggplot2:::resolution(log10(dat$y), FALSE) * 0.4,
  ) +
  scale_x_log10() +
  scale_y_log10() +
  labs(title = 'Scale when resolution is applied')