我试图弄清楚如何使用多列查找值。只是似乎无法使其正常工作。这是一个示例:
df1 <- data.frame(g1 = c("a", "b", "c", "c"), g2 = c(1, 2, 3, 4))
df2 <- data.frame(g.1 = c("a", "b", "c"), g.2 = c(1, 2, 4), val = c(100, 200, 300))
所以我尝试做:
df1$value <- df2[match(df1$g1, df2$g.1) & match(df1$g2, df2$g.2),]$val
但是这不适用于最后一个值,并且猜测它仅对前2个错误起作用。我想让df1看起来像:
g1 g2 value
1 a 1 100
2 b 2 200
3 c 3 NA
4 c 4 300
答案 0 :(得分:2)
尝试使用merge
进行左联接:
merge(df1, df2, by = 1:2, all.x = TRUE)
给予:
g1 g2 val
1 a 1 100
2 b 2 200
3 c 3 NA
4 c 4 300
一些替代方法是:
transform(df1, val = df2$val[match(paste(g1, g2), paste(df2$g.1, df2$g.2))])
library(sqldf)
sqldf("select df1.*, df2.val
from df1 left join df2 on g1 = [g.1] and g2 = [g.2]")
library(dplyr)
df1 %>% left_join(df2, by = c(g1 = "g.1", g2 = "g.2"))
答案 1 :(得分:1)
联接会更好,并且使用data.table
,随着我们更新我的参考,联接会变得更有效率
library(data.table)
setDT(df1)[df2, value := val, on = .(g1 = g.1, g2 = g.2)]
df1
# g1 g2 value
#1: a 1 100
#2: b 2 200
#3: c 3 NA
#4: c 4 300
使用match
,一种方法是将paste
的关注列放在一起,然后创建一个索引来更改值
p1 <- do.call(paste, df1)
p2 <- do.call(paste, df2[1:2])
i1 <- match(p1, p2, nomatch = 0)
i2 <- match(p2, p1, nomatch = 0)
df1$value[i2] <- df2$val[i1]
df1
# g1 g2 value
#1 a 1 100
#2 b 2 200
#3 c 3 NA
#4 c 4 300
答案 2 :(得分:0)
基于@G,我发现我在做什么错。格洛腾迪克的答案。我要做的就是:
df1$value <- df2[match(paste0(df1$g1,df1$g2), paste0(df2$g.1,df2$g.2)),]$val