给定一个文件:
三列是:ID,左侧区域和右侧区域。
region1 1 100
region2 20 120
region3 101 200
region4 220 280
我怎样才能提取不重叠的区域? 得到:
region1 1 100
region3 101 200
region4 220 280
答案 0 :(得分:0)
这是一个使用循环来比较每一行/范围与前一行并使用函数来发现重叠的解决方案。
# example dataset
dt = data.frame(region = 1:4,
min = c(1,20,101,220),
max = c(100,120,200,280))
# order data based on minimum value of range (in case you don't have an order already)
dt = dt[order(dt$min),]
dt
# region min max
# 1 1 1 100
# 2 2 20 120
# 3 3 101 200
# 4 4 220 280
# function that spots overlap
overlap = function(x,y) {
res = ifelse(x[2] >= y[1],1,0)
return(res)
}
# set starting point (row)
i = 2
# a loop that compares each row with the previous one and deletes row when it finds overlap
while(i <= nrow(dt)){
dt_temp = dt
if (overlap(dt_temp[i-1,2:3], dt_temp[i,2:3]) == 1) {dt_temp[i,]=NA; dt = dt_temp[complete.cases(dt_temp),]} else {dt = dt_temp; i = i+1}
}
dt
# region min max
# 1 1 1 100
# 3 3 101 200
# 4 4 220 280
请注意,此过程取决于计算重叠的第一个(固定)范围。因此,如果你有[1,100],[5,10],[15,30],[32,60]的范围,它将只返回[1,100],因为其余的都与它重叠。