Question

我有两个数据帧。

df1看起来像这样：

chr <- c("1","1","2")
pos <- c("1000","2000","2000")
df1=data.frame(cbind(tmp1,tmp2))
df1

chr    pos
1      1000
1      2000
2      2000

df2看起来像这样：

chr <- c("1","1","1","2","2")
start <- c("500","1500","2500","500","1500")
end <- c("1499","2499","3499","1499","2499")
state <- c("state1", "state2", "state1", "state3", "state4")
df2=data.frame(cbind(chr,start,end,state))
df2

chr start  end  state
1   500    1499 state1
1   1500   2499 state2
1   2500   3499 state1
2   500    1499 state3
2   1500   2499 state4

我想在第一个数据框中添加一列state，基于df1$chr列中的值与df2$chr相同，df1$pos中的值为df2$start在df2$end和chr pos state 1 1000 state1 1 2000 state2 2 2000 state4之间。预期的最终结果如下所示：

df2$start

如果df1$pos中的值与val lines = scala.io.Source.fromFile(file).getLines val nameLines = lines .dropWhile(line => !line.startsWith("Names: ")) .takeWhile(line => !line.startsWith("Address: ")) .toSeq val names = (nameLines.head.drop(7) +: nameLines.tail) .mkString(",") .split(",") .map(_.trim) .filter(_.nonEmpty)中的值相同，我知道如何执行此操作，但这是我正在努力的范围。

任何提示都非常有用。

Answer 1

作为一个倾向于SQL的人，我可能会选择sqldf选项：

library(sqldf)
query <- "select df1.chr, df1.pos, df2.state
          from df1
          left join df2
              on df1.chr = df2.chr and
                 df1.pos between df2.start and df2.end"
df1 <- sqldf(query, stringsAsFactors=FALSE)

修改

我认为您的pos，start和end列应该是数字，因为您需要进行涉及数字而不是文本的比较。所以将它们全部转换为数字，上面的解决方案应该有效：

df1$pos <- as.numeric(df1$pos) df2$start <- as.numeric(df2$start) df2$end <- as.numeric(df2$end)

Answer 2

我们可以使用data.table

的非等联接

library(data.table)
setDT(df1)[df2, state := state, on = .(chr, pos > start, pos < end)]
df1
#   chr  pos  state
#1:   1 1000 state1
#2:   1 2000 state2
#3:   2 2000 state4

注意：构建data.frame时，请避免使用data.frame(cbind，因为cbind会转换为matrix而matrix只能容纳一个类。直接使用data.frame。示例数据的另一个问题是使用字符串变量来表示＆＃39; pos＆＃39;，＆＃39; start＆＃39;＆＃39; end＆＃39;。它应该是numeric class

数据

chr <- c("1","1","2")
pos <- c(1000,2000,2000)
df1 <- data.frame(chr, pos)
chr <- c("1","1","1","2","2")
start <- c(500,1500,2500,500,1500)
end <- c(1499,2499,3499,1499,2499)
df2 <- data.frame(chr, start, end, state)

如何使用R中另一个数据帧的值填充列

2 个答案:

数据