Question

我有一组带有类别标签的2D点，并希望可视化哪个类别支配在2D平面上叠加的网格的每个单元格。

我想我可以使用stat_summary_2d选择最常用值的函数，如下所示，但我得到了三个变量的不同图，除了图例标签之外应该是相同的。

我是否误用stat_summary_2d？有没有更好的方法来生成这个情节？

library(ggplot2)
set.seed(12345)
x = runif(1000)
y = runif(1000)
lab = rep(c("red", "blue", "green", "yellow"), 250)

df = data.frame(x=x, y=y, lab=factor(lab, labels=c("red", "blue", "green", "yellow")))
df$val = as.numeric(df$lab)

#Attempt 1
ggplot(df, aes(x=x, y=y)) + 
  stat_summary_2d(aes(z=lab), 
                  fun=function(z) names(which.max(table(z))), 
                  binwidth=.1)

#Attempt 2
ggplot(df, aes(x=x, y=y)) + 
  stat_summary_2d(aes(z=val), 
                  fun=function(z) names(which.max(table(z))), 
                  binwidth=.1)

#Attempt 3
ggplot(df, aes(x=x, y=y)) + 
  stat_summary_2d(aes(z=as.numeric(lab)), 
                      fun=function(z) names(which.max(table(z))),
                      binwidth=.1)

Answer 1

将group = 1添加到尝试1＆amp;您将看到与后续两次尝试相同的面板分布。

适当指定填充调色板，＆amp;这三个看起来都一样：

library(ggplot2)

#Attempt 1
p1 <- ggplot(df, aes(x=x, y=y, group = 1)) + 
  stat_summary_2d(aes(z=lab), 
                  fun=function(z) names(which.max(table(z))), 
                  binwidth=.1) +
  scale_fill_manual(values = c("red" = "red",
                               "blue" = "blue",
                               "green" = "green",
                               "yellow" = "yellow"),
                    breaks = c("red", "blue", "green", "yellow")) +
  ggtitle("Attempt 1") + theme(legend.position = "bottom")

#Attempt 2
p2 <- ggplot(df, aes(x=x, y=y)) + 
  stat_summary_2d(aes(z=val), 
                  fun=function(z) names(which.max(table(z))), 
                  binwidth=.1) +
  scale_fill_manual(values = c("red", "blue", "green", "yellow")) +
  ggtitle("Attempt 2") + theme(legend.position = "bottom")

#Attempt 3
p3 <- ggplot(df, aes(x=x, y=y)) + 
  stat_summary_2d(aes(z=as.numeric(lab)), 
                  fun=function(z) names(which.max(table(z))),
                  binwidth=.1) +
  scale_fill_manual(values = c("red", "blue", "green", "yellow")) +
  ggtitle("Attempt 3") + theme(legend.position = "bottom")

gridExtra::grid.arrange(p1, p2, p3, nrow = 1)

解释：如果检查第一个图的基础数据，您会注意到有379行数据，每行都对应于热图中的一个图块。如果我们总计每个箱子中不同颜色的数量，我们也会得到379，所以实际上每个箱子位置都有多个瓦片。（相比之下，第二和第三个图的基础数据各有100行。）

基于此，我们知道ggplot已经解释了＆＃34; lab＆＃34;成为一个单独的组，并为每个级别单独执行stat_summary_2d()。在审美映射中添加group = 1会强制一起考虑所有级别。

p1.original <- ggplot(df, aes(x=x, y=y)) + 
  stat_summary_2d(aes(z=lab), 
                  fun=function(z) names(which.max(table(z))), 
                  binwidth=.1)

View(layer_data(p1.original))

热图样式图显示网格区域的模态值（通过stat_summary_2d？）

1 个答案: