这是我的数据
> pos
chrA x x_end chrB y y_end
chr11 0 3000000 chr19 0 20000000
chr11 60000000 63500000 chr19 0 20000000
chr11 63500000 67500000 chr19 0 20000000
chr11 67500000 76000000 chr19 0 20000000
chr11 3000000 20000000 chr19 27000000 29500000
chr11 20000000 44000000 chr19 27000000 29500000
chr11 44000000 49500000 chr19 27000000 29500000
chr11 49500000 51500000 chr19 27000000 29500000
chr11 54500000 60000000 chr19 27000000 29500000
chr11 76000000 134500000 chr19 27000000 29500000
chr11 60000000 63500000 chr19 32500000 34500000
chr11 0 3000000 chr19 34500000 47500000
chr11 60000000 63500000 chr19 34500000 47500000
chr11 63500000 67500000 chr19 34500000 47500000
chr11 67500000 76000000 chr19 34500000 47500000
chr11 0 3000000 chr19 47500000 51500000
chr11 60000000 63500000 chr19 47500000 51500000
chr11 63500000 67500000 chr19 47500000 51500000
chr11 67500000 76000000 chr19 47500000 51500000
chr11 63500000 67500000 chr19 54000000 57000000
每一行就像x〜y矩阵中的矩形框。
(x, y), (x, y_end), (x_end, y), (x_end, y_end)
是一个盒子的4个顶点的坐标。
我想获得这些行:
for two rows(boxes) i and j,
if their (x=x, x_end=x_end, y_end=y) or (y=y, y_end=y_end, x_end=x)
get the two boxes and then merge into one box
我的预期结果是:
> res
chrA x x_end chrB y y_end
chr11 0 3000000 chr19 0 20000000
chr11 60000000 76000000 chr19 0 20000000
chr11 3000000 51500000 chr19 27000000 29500000
chr11 54500000 60000000 chr19 27000000 29500000
chr11 76000000 134500000 chr19 27000000 29500000
chr11 0 3000000 chr19 34500000 51500000
chr11 60000000 63500000 chr19 32500000 34500000
chr11 60000000 76000000 chr19 34500000 51500000
chr11 63500000 67500000 chr19 54000000 57000000
这是一张说明问题的图(每个方框代表pos中的一行)。我想将相邻的框合并为一个较大的矩形框):
因此可以简化问题:如何合并这些相邻的框?
这是我的代码(效率低下,错过了一些应该合并的框。):
#find common edges and merge
for (i in 1:(nrow(pos)-1) ){
for (j in (i+1):nrow(pos)){
if (pos[i, "x"] == pos[j, "x"] & pos[i, "x_end"] == pos[j, "x_end"] & pos[i, "y_end"] == pos[j, "y"] ){
pos[j, "y"] <- pos[i, "y"]
pos[i, "y_end"] <- pos[j, "y_end"]}
else if (pos[i, "x_end"] == pos[j, "x"] & pos[i, "y"] == pos[j, "y"] & pos[i, "y_end"] == pos[j, "y_end"]){
pos[j, "x"] <- pos[i, "x"]
pos[i, "x_end"] <- pos[j, "x_end"]}
}
}
pos <- unique(pos)
我希望有更好的方法。