合并具有一个公共边的两种染色体盒数据类型

时间:2018-08-13 07:03:08

标签: r ggplot2 bioinformatics genome

这是我的数据

    > pos

  chrA        x     x_end  chrB        y    y_end
 chr11        0   3000000 chr19        0 20000000
 chr11 60000000  63500000 chr19        0 20000000
 chr11 63500000  67500000 chr19        0 20000000
 chr11 67500000  76000000 chr19        0 20000000
 chr11  3000000  20000000 chr19 27000000 29500000
 chr11 20000000  44000000 chr19 27000000 29500000
 chr11 44000000  49500000 chr19 27000000 29500000
 chr11 49500000  51500000 chr19 27000000 29500000
 chr11 54500000  60000000 chr19 27000000 29500000
 chr11 76000000 134500000 chr19 27000000 29500000
 chr11 60000000  63500000 chr19 32500000 34500000
 chr11        0   3000000 chr19 34500000 47500000
 chr11 60000000  63500000 chr19 34500000 47500000
 chr11 63500000  67500000 chr19 34500000 47500000
 chr11 67500000  76000000 chr19 34500000 47500000
 chr11        0   3000000 chr19 47500000 51500000
 chr11 60000000  63500000 chr19 47500000 51500000
 chr11 63500000  67500000 chr19 47500000 51500000
 chr11 67500000  76000000 chr19 47500000 51500000
 chr11 63500000  67500000 chr19 54000000 57000000

每一行就像x〜y矩阵中的矩形框。

(x, y), (x, y_end), (x_end, y), (x_end, y_end)是一个盒子的4个顶点的坐标。

我想获得这些行:

    for two rows(boxes) i and j,
    if their (x=x, x_end=x_end, y_end=y) or (y=y, y_end=y_end, x_end=x)
    get the two boxes and then merge into one box

我的预期结果是:

> res
 chrA        x     x_end  chrB        y    y_end
chr11        0   3000000 chr19        0 20000000
chr11 60000000  76000000 chr19        0 20000000
chr11  3000000  51500000 chr19 27000000 29500000
chr11 54500000  60000000 chr19 27000000 29500000
chr11 76000000 134500000 chr19 27000000 29500000
chr11        0   3000000 chr19 34500000 51500000
chr11 60000000  63500000 chr19 32500000 34500000
chr11 60000000  76000000 chr19 34500000 51500000
chr11 63500000  67500000 chr19 54000000 57000000

这是一张说明问题的图(每个方框代表pos中的一行)。我想将相邻的框合并为一个较大的矩形框): enter image description here

因此可以简化问题:如何合并这些相邻的框?

这是我的代码(效率低下,错过了一些应该合并的框。):

#find common edges and merge
  for (i in 1:(nrow(pos)-1) ){
    for (j in (i+1):nrow(pos)){
      if (pos[i, "x"] == pos[j, "x"] & pos[i, "x_end"] == pos[j, "x_end"] & pos[i, "y_end"] == pos[j, "y"] ){
        pos[j, "y"] <- pos[i, "y"]
        pos[i, "y_end"] <- pos[j, "y_end"]} 
      else if (pos[i, "x_end"] == pos[j, "x"] & pos[i, "y"] == pos[j, "y"] & pos[i, "y_end"] == pos[j, "y_end"]){
        pos[j, "x"] <- pos[i, "x"]
        pos[i, "x_end"] <- pos[j, "x_end"]}
    }
  }

pos <- unique(pos)

我希望有更好的方法。

0 个答案:

没有答案