我有两个数据框,希望在一个(DF1$pos
)中使用该值来搜索DF2中的两列(DF2start,DF2end),如果它在这些数字范围内,则返回DF2$name
< / p>
DF1
ID pos name
chr 12
chr 542
chr 674
DF2
ID start end annot
chr 1 200 a1
chr 201 432 a2
chr 540 1002 a3
chr 2000 2004 a4
所以在这个例子中我希望DF1成为
ID pos name
chr 12 a1
chr 542 a3
chr 674 a3
我尝试过使用merge和intersect但不知道如何使用带有逻辑表达式的if
语句。
数据帧应编码如下,
DF1 <- data.frame(ID=c("chr","chr","chr"),
pos=c(12,542,672),
name=c(NA,NA,NA))
DF2 <- data.frame(ID=c("chr","chr","chr","chr"),
start=c(1,201,540,200),
end=c(200,432,1002,2004),
annot=c("a1","a2","a3","a4"))
答案 0 :(得分:5)
也许你可以使用&#34; data.table&#34;中的foverlaps
。封装
library(data.table)
DT1 <- data.table(DF1)
DT2 <- data.table(DF2)
setkey(DT2, ID, start, end)
DT1[, c("start", "end") := pos] ## I don't know if there's a way around this step...
foverlaps(DT1, DT2)
# ID start end annot pos i.start i.end
# 1: chr 1 200 a1 12 12 12
# 2: chr 540 1002 a3 542 542 542
# 3: chr 540 1002 a3 674 674 674
foverlaps(DT1, DT2)[, c("ID", "pos", "annot"), with = FALSE]
# ID pos annot
# 1: chr 12 a1
# 2: chr 542 a3
# 3: chr 674 a3
如@Arun在评论中所述,您还可以使用which = TRUE
中的foverlaps
来提取相关值:
foverlaps(DT1, DT2, which = TRUE)
# xid yid
# 1: 1 1
# 2: 2 3
# 3: 3 3
DT2$annot[foverlaps(DT1, DT2, which = TRUE)$yid]
# [1] "a1" "a3" "a3"
答案 1 :(得分:2)
您也可以使用IRanges
source("http://bioconductor.org/biocLite.R")
biocLite("IRanges")
library(IRanges)
DF1N <- with(DF1, IRanges(pos, pos))
DF2N <- with(DF2, IRanges(start, end))
DF1$name <- DF2$annot[subjectHits(findOverlaps(DF1N, DF2N))]
DF1
# ID pos name
#1 chr 12 a1
#2 chr 542 a3
#3 chr 674 a3