根据参考数据帧参数添加字符向量

时间:2016-03-04 11:31:50

标签: r dataframe

我有一个数据框df1:

chr = c( 1,1,1,1,2,2,2,2)
point = c (257,752,135,1650,252,756,1230,1710)
df1 = data.frame(chr, point)

  chr point
1   1   257
2   1   752
3   1   135
4   1  1650
5   2   252
6   2   756
7   2  1230
8   2  1710

我想为此名称name添加一个新列。要分配的名称来自参考数据框df2:

chrB = c( 1,1,1,1,2,2,2,2)
txstart = c(0,501,1001,1501,0,501,1001,1501)
txstop = c(500,1000,1500,2000,500,1000,1500,2000)
name2 = c("F","W","Q","G","V","S","L","Y") 

  chrB txstart txstop name2
1    2       0    500    F
2    2     501   1000    W
3    2    1001   1500    Q
4    2    1501   2000    G
5    1       0    500    V
6    1     501   1000    S
7    1    1001   1500    L
8    1    1501   2000    Y

df1中的chrchrB中的df2相同而point中的df1位于值txstart和{{1}之间}} txstop中的name2应添加到df2。结果我想在下面:

df1

任何帮助都非常感谢!!!

1 个答案:

答案 0 :(得分:1)

使用更新的数据集,只有foverlaps方法有效:

dt1 <- data.table(chr, mp1 = point, mp2 = point, 
                  key = c("chr","mp1", "mp2"))
dt2 <- data.table(chrB, txstart, txstop, name2, 
                  key = c("chrB","txstart", "txstop"))

foverlaps(dt1, dt2, type="within")[, .(chr, midpoint=mp1, name=name2)][]

给出:

   chr midpoint name
1:   1      135    F
2:   1      257    F
3:   1      752    W
4:   1     1650    G
5:   2      252    V
6:   2      756    S
7:   2     1230    L
8:   2     1710    Y

旧回答:

如果您想查看中点是否在df2的起点和终点之间,您可以使用:

df1$name <- df2$name2[match(df1$chr,df2$chrB) & 
                        df1$midpoint > df2$txstart & 
                        df1$midpoint < df2$txstop]

给出:

> df1
  chr midpoint name
1   1      250    F
2   1      750    W
3   1     1250    Q
4   1     1750    G
5   2      250    V
6   2      750    S
7   2     1250    L
8   2     1750    Y

作为替代方法,您可以使用 data.table 包中的foverlaps函数:

library(data.table)

dt1 <- data.table(chr, mp1 = midpoint, mp2 = midpoint, key = c("chr","mp1", "mp2"))
dt2 <- data.table(chrB, txstart, txstop, name2, key = c("chrB","txstart", "txstop"))

foverlaps(dt1, dt2, type="within", nomatch=0L)[, .(chr, midpoint=mp1, name=name2)][]

给出相同的结果:

   chr midpoint name
1:   1      250    F
2:   1      750    W
3:   1     1250    Q
4:   1     1750    G
5:   2      250    V
6:   2      750    S
7:   2     1250    L
8:   2     1750    Y