如何使用多列作为键合并两个数据帧?

时间:2015-03-23 14:22:27

标签: r merge dataframe compound-key

说我有以下数据帧:

DF1 <- data.frame("A" = rep(c("A","B"), 18),
                  "B" = rep(c("C","D","E"), 12),
                  "NUM"= rep(rnorm(36,10,1)),
                  "TEST" = rep(NA,36))

DF2 <- data.frame("A" = rep("A",6),
                  "B" = rep(c("C","D"),6),
                  "VAL" = rep(c(1,3),3))

*注意:A中变量BDF2的每个唯一组合都应具有唯一VAL

对于每一行,如果NA列中的值,我想将TEST中的VAL替换为DF1A的相应值和A匹配,BB列中的值与该行匹配。否则,我将TEST保留为NA。如果不使用匹配来循环每个组合,我将如何做到这一点?

理想情况下,答案会缩放到两个数据框,其中有许多列要匹配。

2 个答案:

答案 0 :(得分:8)

# this is your DF1    
DF1 <- data.frame("A" = rep(c("A","B"), 18),
                      "B" = rep(c("C","D","E"), 12),
                      "NUM"= rep(rnorm(36,10,1)),
                      "TEST" = rep(NA,36))

#this is a DF2 i created, with unique A, B, VAL
DF2 <- data.frame("A" = rep(c("A","B"),3),
                  "B" = rep(c("C","D","E"),2),
                  "VAL" = rep(1:6))

# and this is the answer of what i assume you want      
tmp <- merge(DF1,DF2, by=c("A","B"), all.x=TRUE, all.y=FALSE)
DF1[4] <- tmp[5]

答案 1 :(得分:6)

正如Akrun在评论中提到的,您的查找表(DF2)需要简化为其唯一的A / B组合。对于您当前的数据框架,这不是问题,但如果同一组合有多个可能的值,则需要其他规则。从那里,解决方案很简单:

DF2.u <- unique(DF2)
DF3 <- merge(DF1, DF2.u, all = T)

请注意,这将生成一个新的数据框,其中包含空TEST列(所有值NA)和从DF2分配的VAL列。要做到你想要的(尽可能用VAL替换TEST),这里有一些稍微笨重的代码:

DF1$TEST <- merge(DF1, DF2.u, all = T)$VAL

编辑:在回答您的问题时,如果需要,您可以将DF2归结为非常简单:

DF2$C <- c(1:12) #now unique() won't work
DF2.u <- unique(DF2[1:3])

 A B VAL
1 A C   1
2 A D   3