Question

我在R中有一个像这样的数据框：

dat = data.frame(Sample = c(1,1,2,2,3), Start = c(100,300,150,200,160), Stop = c(180,320,190,220,170))

我想绘制它，使得x轴是位置，y轴是该位置的样本数，每个样本的颜色不同。因此，在上面的示例中，您将拥有一些高度为1的位置，一些高度为2，一个高度为3的区域。目的是找到存在大量样本的区域以及该区域中的样本。

即。类似的东西：

      &
     ---
********-  --       **

其中* =样品1， - =样品2和＆amp; =样本3

Answer 1

我的第一次尝试：

dat$Sample = factor(dat$Sample)
ggplot(aes(x = Start, y = Sample, xend = Stop, yend = Sample, color = Sample), data = dat) + 
  geom_segment(size = 2) + 
  geom_segment(aes(x = Start, y = 0, xend = Stop, yend = 0), size = 2, alpha = 0.2, color = "black")

enter image description here

我在这里组合了两个分段几何。一个绘制彩色垂直条。这些显示了样品的测量位置。第二个几何图形绘制了下方的灰色条，其中显示了样本的密度。有什么意见要改进这个快速黑客？

Answer 2

这个hack可能就是你要找的东西，但是为了利用geom_histogram的堆叠，我大大增加了数据帧的大小。

library(ggplot2)
dat = data.frame(Sample = c(1,1,2,2,3), 
                 Start = c(100,300,150,200,160), 
                 Stop = c(180,320,190,220,170))

# Reformat the data for plotting with geom_histogram.
dat2 = matrix(ncol=2, nrow=0, dimnames=list(NULL, c("Sample", "Position")))

for (i in seq(nrow(dat))) {
    Position = seq(dat[i, "Start"], dat[i, "Stop"])
    Sample = rep(dat[i, "Sample"], length(Position))
    dat2 = rbind(dat2, cbind(Sample, Position))
}

dat2 = as.data.frame(dat2)
dat2$Sample = factor(dat2$Sample)

plot_1 = ggplot(dat2, aes(x=Position, fill=Sample)) +
         theme_bw() +
         opts(panel.grid.minor=theme_blank(), panel.grid.major=theme_blank()) +
         geom_hline(yintercept=seq(0, 20), colour="grey80", size=0.15) +
         geom_hline(yintercept=3, linetype=2) +
         geom_histogram(binwidth=1) +
         ylim(c(0, 20)) +
         ylab("Count") +
         opts(axis.title.x=theme_text(size=11, vjust=0.5)) +
         opts(axis.title.y=theme_text(size=11, angle=90)) +
         opts(title="Segment Plot")

png("plot_1.png", height=200, width=650)
print(plot_1)
dev.off()

请注意，我重新格式化数据帧的方式有点难看，并且不能很好地扩展（例如，如果你有数百万个段和/或大的起始和停止位置）。

enter image description here

绘制R中的重叠位置

2 个答案: