我有一个数据框df1:
chr = c( 1,1,1,1,2,2,2,2)
point = c (257,752,135,1650,252,756,1230,1710)
df1 = data.frame(chr, point)
chr point
1 1 257
2 1 752
3 1 135
4 1 1650
5 2 252
6 2 756
7 2 1230
8 2 1710
我想为此名称name
添加一个新列。要分配的名称来自参考数据框df2:
chrB = c( 1,1,1,1,2,2,2,2)
txstart = c(0,501,1001,1501,0,501,1001,1501)
txstop = c(500,1000,1500,2000,500,1000,1500,2000)
name2 = c("F","W","Q","G","V","S","L","Y")
chrB txstart txstop name2
1 2 0 500 F
2 2 501 1000 W
3 2 1001 1500 Q
4 2 1501 2000 G
5 1 0 500 V
6 1 501 1000 S
7 1 1001 1500 L
8 1 1501 2000 Y
df1中的chr
与chrB
中的df2
相同而point
中的df1
位于值txstart
和{{1}之间}} txstop
中的name2
应添加到df2
。结果我想在下面:
df1
任何帮助都非常感谢!!!
答案 0 :(得分:1)
使用更新的数据集,只有foverlaps
方法有效:
dt1 <- data.table(chr, mp1 = point, mp2 = point,
key = c("chr","mp1", "mp2"))
dt2 <- data.table(chrB, txstart, txstop, name2,
key = c("chrB","txstart", "txstop"))
foverlaps(dt1, dt2, type="within")[, .(chr, midpoint=mp1, name=name2)][]
给出:
chr midpoint name
1: 1 135 F
2: 1 257 F
3: 1 752 W
4: 1 1650 G
5: 2 252 V
6: 2 756 S
7: 2 1230 L
8: 2 1710 Y
旧回答:
如果您想查看中点是否在df2
的起点和终点之间,您可以使用:
df1$name <- df2$name2[match(df1$chr,df2$chrB) &
df1$midpoint > df2$txstart &
df1$midpoint < df2$txstop]
给出:
> df1
chr midpoint name
1 1 250 F
2 1 750 W
3 1 1250 Q
4 1 1750 G
5 2 250 V
6 2 750 S
7 2 1250 L
8 2 1750 Y
作为替代方法,您可以使用 data.table 包中的foverlaps
函数:
library(data.table)
dt1 <- data.table(chr, mp1 = midpoint, mp2 = midpoint, key = c("chr","mp1", "mp2"))
dt2 <- data.table(chrB, txstart, txstop, name2, key = c("chrB","txstart", "txstop"))
foverlaps(dt1, dt2, type="within", nomatch=0L)[, .(chr, midpoint=mp1, name=name2)][]
给出相同的结果:
chr midpoint name
1: 1 250 F
2: 1 750 W
3: 1 1250 Q
4: 1 1750 G
5: 2 250 V
6: 2 750 S
7: 2 1250 L
8: 2 1750 Y