ggplot2绘制比数据框中更多的点,geom_point + facet_grid

时间:2013-02-03 21:10:39

标签: r ggplot2

我有一些数据,我正在尝试制作带有抖动点叠加的箱线图。我的问题是关键点,所以我们会坚持这一点。

以下是数据:

> dput(test)
structure(list(var1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 
8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L), .Label = c("A", "B", "C", "D", 
"E", "F", "G", "H", "I"), class = "factor"), var2 = structure(c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 
4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 
6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L), .Label = c("V1", 
"V2", "V3", "V4", "V5", "V6", "V7"), class = "factor"), response1 = c(5L, 
6L, 5L, 5L, 5L, 5L, 4L, 6L, 6L, 5L, 5L, 6L, 6L, 4L, 1L, 1L, NA, 
1L, NA, NA, 1L, 1L, 1L, NA, 1L, NA, NA, 1L, 5L, 5L, 4L, 5L, 3L, 
2L, 3L, 1L, 1L, NA, 1L, NA, NA, 1L, NA, NA, 2L, NA, 3L, 1L, NA, 
NA, NA, 4L, NA, 4L, 5L, NA, NA, NA, 1L, NA, 1L, 1L, NA), response2 = c(2L, 
2L, 2L, 2L, 2L, 2L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 5L, 5L, NA, 
5L, NA, NA, 5L, 5L, 5L, NA, 5L, NA, NA, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, NA, 5L, NA, NA, 5L, NA, NA, 5L, NA, 5L, 5L, NA, 
NA, NA, 5L, NA, 5L, 5L, NA, NA, NA, 5L, NA, 5L, 5L, NA), response3 = c(4L, 
5L, 1L, 1L, 4L, 1L, 1L, 4L, 5L, 1L, 1L, 5L, NA, 1L, 4L, NA, NA, 
NA, 3L, 2L, NA, 4L, NA, NA, NA, 3L, NA, NA, 4L, NA, 1L, NA, 3L, 
NA, 2L, 4L, NA, NA, NA, NA, NA, NA, NA, 2L, 1L, 1L, NA, NA, 1L, 
NA, 3L, 1L, NA, NA, NA, 1L, NA, 3L, 1L, NA, NA, NA, 1L)), .Names = c("var1", 
"var2", "response1", "response2", "response3"), class = "data.frame", row.names = c(NA, 
-63L))

我使用reshape2来融合我的数据,用于绘图命令的分面/同化:

library(reshape2)
test_melted <- melt(test, id.var = c("var1", "var2"), na.rm = T)

这是我创作的情节:

library(ggplot2)
p <- ggplot(test_melted, aes(x = var1, y = value)) + geom_point()
p <- p + facet_grid(~variable) + coord_flip()
p <- p + geom_jitter(position = position_jitter(width=0.2, height = 0.2))
p

产生这个:

enter image description here

看起来很正常,但后来我注意到每个方面/因子水平的点数似乎比应有的多。我缩小到var1

的一个级别
test_subset <- test_melted[test_melted$var1 == "E", ]

nrow(test_subset)
[1] 18

summary(test_subset)
      var1    var2        variable     value  
 E      :18   V1:3   response1:7   Min.   :1  
 A      : 0   V2:2   response2:7   1st Qu.:3  
 B      : 0   V3:3   response3:4   Median :5  
 C      : 0   V4:2                 Mean   :4  
 D      : 0   V5:3                 3rd Qu.:5  
 F      : 0   V6:2                 Max.   :5  
 (Other): 0   V7:3 

因此,我们应该总共绘制18个点(response1为7,response2为7,response3为4。让我们尝试一下:

p <- ggplot(test_subset, aes(x = var1, y = value)) + geom_point()
p <- p + facet_grid(~variable) + coord_flip()
p <- p + geom_jitter(position = position_jitter(width=0.2, height = 0.2))
p

enter image description here

我在response1方面计算了11个点,response2中有8个点,response3中有8个点。

这一定是我想念的傻事。我已经用点图进行了大量的刻面,并且从未发生过这种情况(或从未注意到!)。

我尝试过的事情

  • 删除coord_flip()
  • test_subset <- droplevels(test_subset)如果空因子水平搞乱了
  • 使用facet_grid(~variable)facet_grid(.~variable)对比facet_grid(variable~)facet_grid(variable~.)
  • 进行对比

作为最后一点,我会得到不同数量的积分,具体取决于我是否面对。通过刻面,我得到11 + 8 + 8 = 27,如果我删除facet_grid(~variable),我会得到23。

感谢您的任何建议!

1 个答案:

答案 0 :(得分:2)

问题不在于分面,而是因为在你的情节中使用两个geoms。因此geom_point会在一个地方绘制你的积分,然后geom_jitter将在随机位置再次绘制它们。这就是为什么你可以在每个情节中再看到一个点。

如果您取消对geom_point的呼叫,一切都恢复正常:

p <- ggplot(test_subset, aes(x = var1, y = value))
p <- p + facet_grid(~variable) + coord_flip()
p <- p + geom_jitter(position = position_jitter(width=0.2, height = 0.2))
p

enter image description here