Question

我希望将stat_density2D函数与分类变量一起使用，但将我的绘图限制在高密度区域，以减少重叠并提高易读性。

让我们举一个例子来说明以下数据：

plot_data <-
  data.frame(X = c(rnorm(300, 3, 2.5), rnorm(150, 7, 2)),
             Y = c(rnorm(300, 6, 2.5), rnorm(150, 2, 2)),
             Label = c(rep('A', 300), rep('B', 150)))

ggplot(plot_data, aes(X, Y, colour = Label)) + geom_point()

使用2D密度图我们获得重叠密度

ggplot(plot_data, aes(X, Y)) + 
  stat_density_2d(geom = "polygon", aes(alpha = ..level.., fill = Label))

是否可以仅绘制高密度区域（例如level>0.03）？我发现的唯一解决方案是＆＃34;欺骗＆＃34;并使用步进函数或任何幂变换手动修改..levels..变量，就像在这个简单的例子中一样。

ggplot(plot_data, aes(X, Y)) + 
  stat_density_2d(geom = "polygon", aes(alpha = (..level..) ^ 2, fill = Label)) + 
  scale_alpha_continuous(range = c(0, 1))

可以让ggplot2 / stat_density2D函数仅关注某个密度水平范围，而不是修改..levels..变量吗？我尝试使用range limits函数的scale_alpha_continuous个参数，而没有任何相关结果......

谢谢！

Answer 1

您必须手动生成2d内核密度并绘制结果。这样，您可以选择每个点上的值，例如避免重叠。这是代码：

plot_data <-
  data.frame(X = c(rnorm(300, 3, 2.5), rnorm(150, 7, 2)),
             Y = c(rnorm(300, 6, 2.5), rnorm(150, 2, 2)),
             Label = c(rep('A', 300), rep('B', 150)))


library(ggplot2)
library(MASS)
library(tidyr)
#Calculate the range
xlim <- range(plot_data$X)
ylim <-range(plot_data$Y)


#Genrate the kernel density for each group
newplot_data <- plot_data %>% group_by(Label) %>% do(Dens=kde2d(.$X, .$Y, n=100, lims=c(xlim,ylim)))

#Transform the density in  data.frame
newplot_data  %<>%  do(Label=.$Label, V=expand.grid(.$Dens$x,.$Dens$y), Value=c(.$Dens$z)) %>% do(data.frame(Label=.$Label,x=.$V$Var1, y=.$V$Var2, Value=.$Value))

#Untidy data and chose the value for each point.
#In this case chose the value of the label with highest value  
   newplot_data  %<>%   spread( Label,value=Value) %>%
        mutate(Level = if_else(A>B, A, B), Label = if_else(A>B,"A", "B"))

轮廓图：

# Contour plot
ggplot(newplot_data, aes(x,y, z=Level, fill=Label, alpha=..level..))  + stat_contour(geom="polygon")

由于圆形误差，轮廓图似乎有一些重叠。我们可以试试栅格图：

#Raster plot
ggplot(newplot_data, aes(x,y, fill=Label, alpha=Level))  + geom_raster()

Answer 2

选项1
通过添加stat_density_2d参数bins，您绝对可以避免过度绘图，控制并以非常经济的方式吸引对众多密度区域的注意。

set.seed(123)
plot_data <-
  data.frame(
    X = c(rnorm(300, 3, 2.5), rnorm(150, 7, 2)),
    Y = c(rnorm(300, 6, 2.5), rnorm(150, 2, 2)),
    Label = c(rep('A', 300), rep('B', 150))
  )
ggplot(plot_data, aes(X, Y, group = Label)) +
  stat_density_2d(geom = "polygon",
                  aes(alpha = ..level.., fill = Label),
                  bins = 4)

选项2
手动分配颜色，NA为我们不想绘制的那些级别。主要缺点是，我们应该事先知道所需的级别和颜色的数量（或计算它们）。在我的set.seed(123)示例中，我们需要7。

ggplot(plot_data, aes(X, Y, group = Label)) +
  stat_density_2d(geom = "polygon", aes(fill = as.factor(..level..))) +
  scale_fill_manual(values = c(NA, NA, NA,"#BDD7E7", "#6BAED6", "#3182BD", "#08519C"))

使用ggplot2的stat_density_2d仅显示高密度区域

2 个答案: