Question

我列出了一系列来自一系列患者的基因组区域。

> head(dotoo)
GRanges with 6 ranges and 3 metadata columns:
    seqnames                 ranges strand |       Id       CN Histology
       <Rle>              <IRanges>  <Rle> | <factor> <factor>  <factor>
[1]        3 [167946693, 168005541]      * |        9        3        MD
[2]        3 [189907623, 189954633]      * |        9        3        MD
[3]        6 [132274121, 132384438]      * |        9        3        MD
[4]       11 [ 67685096,  70138399]      * |        9        4        MD
[5]       12 [ 53859037,  53927595]      * |        9        3        MD
[6]       15 [ 19830049,  20089383]      * |        9        1        MD

当我使用

绘制基因组畸变时

autoplot(dotoo, aes(fill=as.factor(Id), color=as.factor(Id)))

我看到很多重叠的区域，见图像

enter image description here

如何找出至少3名患者之间哪些区域重叠并共享CN？

基本上，如果您查看图片，我如何找到“堆叠”的区域，并且只查找共享的部分？有办法吗？

Answer 1

获取“不相交”区域的列表（可能这不是你想要的？其他选项是reduce而只是原始的dotoo对象而没有这个步骤

d = disjoint(dotoo)

找到原始区域和每个不相交区域之间的重叠

olap = findOverlaps(query=dotoo, subject=d)

根据主题和CN

将索引拆分为重叠

splt = split(seq_along(olap), list(subjectHits(olap), dotoo$CN[queryHits(olap)]))

将这些过滤到满足您条件的那些

filt = Filter(function(x) length(x) >= 3, splt)

filt现在是olap的索引列表。您可以创建一个重叠元素的GRangesList

idx = unlist(filt)
grp = rep(seq_along(filt), sapply(filt, length))
splitAsList(dotoo[queryHits(olap)[idx]], grp)

在Bioconductor Bioconductor上询问有关mailing list包裹的问题（无需订阅）。

用GRanges找到染色体的重叠区域

1 个答案: