合并具有重叠间隔的两个data.frames

时间:2013-10-22 01:51:13

标签: r dataframe

如何将data.frames中的data.frame与重叠间隔合并?

数据框1

read.table(textConnection(
 "   from to Lith Form 
1   0   1.2 GRN   BCM
2   1.2 5.0 GDI   BDI
"), header=TRUE)    

数据框2

read.table(textConnection(
"   from to Weath Str
1   0  1.1  HW ES
2   1.1 2.9 SW VS
3   2.9 5.0 HW ST 
"), header=TRUE) 

产生的数据框

from to Weath Str Lith Form
1 0.0 1.1 HW ES GRN  BCM
2 1.1 1.2 SW VS GRN  BCM
3 1.2 2.9 SW VS GDI  BDI
4 2.9 5.0 HW ST GDI  BDI

1 个答案:

答案 0 :(得分:8)

这是一种方法。它与eddi(R cutting two data.frames based on intervals and merging)的答案类似,但您可以根据需要在data.frames中包含尽可能多的列。

# change your data to data.table
dt1 <- data.table(df1, key='from')
dt2 <- data.table(df2, key='from')
# skeleton for joined data.table
dt <- data.table(from=sort(unique(c(dt1[,from], dt2[,from]))), 
                 to=sort(unique(c(dt1[,to], dt2[,to]))), 
                 key='from')
# function to join skeleton with data.table
j1 <- function(dt, dt1){
  dt3 <- dt1[dt, roll=TRUE]
  dt3[,':='(to=to.1, to.1=NULL)]
  setkey(dt3, from, to)
  return(dt3)
}
# merge two data.tables
j1(dt, dt2)[j1(dt, dt1)]

在v1.9.3中,最近实现了重叠连接(或间隔连接)。有了这个,我认为您的任务可以完成如下(假设您的data.frames是df1df2):

require(data.table) ## 1.9.3+
setDT(df1)  ## convert to data.table without copy
setDT(df2)

setkey(df2, from, to)
ans = foverlaps(df1, df2, type="any")
ans = ans[, `:=`(from = pmax(from, i.from), to = pmin(to, i.to))]
ans = ans[, `:=`(i.from=NULL, i.to=NULL)][from <= to]
#    from  to Weath Str Lith Form
# 1:  0.0 1.1    HW  ES  GRN  BCM
# 2:  1.1 1.2    SW  VS  GRN  BCM
# 3:  1.2 2.9    SW  VS  GDI  BDI
# 4:  2.9 5.0    HW  ST  GDI  BDI