我有df1
df1 <- data.frame(states = c("wash", "mont", "oreg", "cali", "michi"), key1 = c(1,3,5,7,9), key2 = c(2,4,6,8,10))
看起来像这样(key1和key2是一个键数组):
states key1 key2
1 wash 1 2
2 mont 3 4
3 oreg 5 6
4 cali 7 8
5 michi 9 10
df2有其他信息
df2 <- data.frame(sample = c(9,8,5,4,1), value = c("steel", "gold", "blue", "grey", "green"))
看起来像这样:
sample value
1 9 steel
2 8 gold
3 5 blue
4 4 grey
5 1 green
df2中的样本需要与df1中的EITHER key1或key2匹配才能生成df3
states key1 key2 sample value
1 wash 1 2 1 green
2 mont 3 4 4 grey
3 oreg 5 6 5 blue
4 cali 7 8 8 gold
5 michi 9 10 9 steel
然后我可以删除样本列...不是问题。如果样本的值可以在key1或key2中,如何将df2扩展到df3?
谢谢!
答案 0 :(得分:4)
您可以使用几个merge
来调用并绑定其输出:
rbind(transform(merge(df1, df2, by.x = "key1", by.y = "sample"), sample = key1),
transform(merge(df1, df2, by.x = "key2", by.y = "sample"), sample = key2))
# key1 states key2 value sample
# 1 1 wash 2 green 1
# 2 5 oreg 6 blue 5
# 3 9 michi 10 steel 9
# 4 3 mont 4 grey 4
# 5 7 cali 8 gold 8
另一种方法:
match.idx <- pmax(match(df1$key1, df2$sample),
match(df1$key2, df2$sample), na.rm = TRUE)
cbind(df1, df2[match.idx, ])
# states key1 key2 sample value
# 5 wash 1 2 1 green
# 4 mont 3 4 4 grey
# 3 oreg 5 6 5 blue
# 2 cali 7 8 8 gold
# 1 michi 9 10 9 steel
答案 1 :(得分:3)
一种方法是通过将key1
与df2$sample
和key2
与df2$sample
匹配来创建新键列,然后您就可以直接加入。我将使用data.table
来说明这一点。
require(data.table) ## >= 1.9.0
setDT(df1) ## convert data.frame to data.table by reference
setDT(df2) ## idem
# get the key as a common column
df1[(key1 %in% df2$sample), the_key := key1]
df1[(key2 %in% df2$sample), the_key := key2]
此处:=
再次通过引用分配新列(不进行复制)。现在剩下的只是setkey
和join
。
# setkey and join
setkey(df1, the_key)
setkey(df2, sample)
df1[df2]
# states key1 key2 the_key value
# 1: wash 1 2 1 green
# 2: mont 3 4 4 grey
# 3: oreg 5 6 5 blue
# 4: cali 7 8 8 gold
# 5: michi 9 10 9 steel