Question

我的数据包含多个样本，两个条件的组合以及这些条件的结果，在这种情况下是真阳性的数量和误报的数量。

我想出来的最佳方式是叠加点图。以下是单个样本的结果，基本上看起来我想要它：

Single sample stacked dot plot

现在，我想要做的是将所有样本中的总误差和误差相加，并以完全相同的方式绘制它们。当我尝试时，每个样本的所有点都堆叠在一起，而不是总结和绘制在一起，如下所示：

enter image description here

（注意靶心模式，每个点应该只有2个圆圈。）

以下是一些相同形式的较小样本数据，以及我尝试过的方法，使用stat_sum（）：

require(dplyr)

samples <- c(rep("Sample 1", 4), rep("Sample 2", 4), rep("Sample 3", 4))
cond1 <- c("A", "A", "B", "B")
cond2 <- rep(c("X", "Y"))

data <- as.data.frame(cbind(samples, cond1, cond2))
data$true <- sample(30, length(data$samples))
data$false <- sample(20, length(data$samples))

data <- gather(data, type, hits, true, false)

#The good single-sample version
ggplot(filter(data, sample == "Sample 1"), aes(
    x = cond1, y = cond2, size = hits, color = type)) +
  geom_point(alpha = 0.2) +
  scale_size_area(max_size = 20)

#Trying stat_sum() across hits
ggplot(data, aes(x = cond1, y = cond2, size = hits, color = type)) +
  stat_sum(aes(group = hits), alpha =0.2) +
  scale_size_area(max_size = 20)

#Trying stat_sum() weighting by hits
ggplot(data, aes(x = cond1, y = cond2, size = hits, color = type)) +
  stat_sum(aes(group = 1, weight = hits), alpha =0.2) +
  scale_size_area(max_size = 20)

如何获得样本中真假匹配的总和，并按条件绘制它们？

Answer 1

在使用dplyr＆quot; group_by（）和summarize（）进行绘图之前转换数据集有效：

require(dplyr)
grouped_data <- group_by(data, cond1, cond2, type)
summarize(grouped_data, hits = sum(hits))

ggplot(grouped_data, aes(x = cond1, y = cond2, size = hits, color = type)) +
geom_point(alpha = 0.2) +
scale_size_area(max_size = 20)

在ggplot2

1 个答案: