问题：

Question

问题：

我有一个包含2个变量（x，y）的数据框。 y变量“通常”在“小范围”内变化。数据框中的异常值很少。这是一个例子：

# uniform sample data frame
# y variable "typically" varying in a "small" range between 0 and 1
df = data.frame(
  x = 1:100,
  y = runif(100)
  )

# add 2 outlier to data frame
# yielding a data frame 
# with 99 normal values and 1 outlier
df[3, 2] = 50
df[4, 2] = -50

所以数据框在y变量中有98个典型值和2个异常值，正如您从前10行head(df, 10)中可以看到的那样：

        x           y
1   1   0.9785541
2   2   0.2321611
3   3  50.0000000
4   4 -50.0000000
5   5   0.8316717
6   6   0.1135077
7   7   0.9633120
8   8   0.1473229
9   9   0.1436269
10 10   0.9252299

当绘制数据框作为条形图（y~x）时，ggplot2会自动（并且正确地）将y轴缩放到观察到的y值的整个范围：

require("ggplot2")
ggplot(df, aes(x, y)) + geom_bar(stat="identity")

unwanted plot, 2 outlier stretches the y scale, 98 data points for y-variable look almost same

为了专注于“典型”值，我希望ggplot2能够将y轴刻度保持在“小”刻度上绘制离轴极限值。

这是我的第一次尝试：

lower.cut = quantile(df$y, 0.02)  
# = 0.01096518
upper.cut = quantile(df$y, 0.98)  
# = 0.9872347 

ggplot(df, aes(x, y)) + geom_bar(stat="identity") +
  coord_cartesian( ylim = c(-lower.cut*1.1, upper.cut*1.1) )

wanted plot appearance, but semi automatic .cut setting

问题：

第一次尝试的缺点是0.02和0.98分位数设置是任意的。

是否有更智能（更少任意，更具统计学证明）的方法让ggplot2自动将其轴限制为典型值，同时允许异常值在轴限制之外绘制？

答案我调查了：

Ignore outliers in ggplot2 boxplot：专注于ggplot2的geom_boxplot而不是geom_bar。
ggplot2 barplot dealing with 1 outlier pushing the axis up [duplicate]：答案，暗示使用事实排除异常值，我不想这样做。该问题也标记为重复，但类似question "What are alternatives to broken axes?"的链接仅提供了关于如何通过异常值处理轴拉伸的一般挑战的答案，但没有针对我的具体问题的具体解决方案。

自动轴限制在ggplot2中标识异常值

问题：

问题：

答案我调查了：

0 个答案: