合并两个不同的数据框(如VLOOKUP)

时间:2016-06-21 02:12:32

标签: r dataframe merge lookup vlookup

我有两个数据框:

> head(df_Edges)  
  Source         Target     Type Weight
@kuabt        @_chuad Directed      1
@kuabt @arifsetia2013 Directed      1
@kuabt         @kuabt Directed      1
@kuabt     @chongbeng Directed      1
@kuabt     @billtay25 Directed      1
@kuabt        @gst183 Directed      1

> head(df_Nodes)
   Id          Label
73         @kuabt
148     @billtay25
168     @chongbeng
187 @nonvitaltooth
216        @gst183
244 @arifsetia2013

我想将df_edge中的标签更改为" Id编号",结果将如下:

  Source         Target         Type Weight
   73            298     Directed      1
   73            244     Directed      1
   73             73     Directed      1
   73            168     Directed      1
   73            148     Directed      1
   73            216     Directed      1

我是这样想的,

df<-merge(df_Nodes, df_Edges, by.x = "Label", by.y = "Source")

但结果仍然像以前一样。 那么,我怎么能做到呢? 感谢。

2 个答案:

答案 0 :(得分:2)

您在这里不需要merge,因为您可以直接使用match两个应用程序执行此操作:

df_Edges$Source <- df_Nodes$Id[match(df_Edges$Source, df_Nodes$Label)]
df_Edges$Target <- df_Nodes$Id[match(df_Edges$Target, df_Nodes$Label)]
df_Edges
##   Source Target     Type Weight
## 1     73     NA Directed      1
## 2     73    244 Directed      1
## 3     73     73 Directed      1
## 4     73    168 Directed      1
## 5     73    148 Directed      1
## 6     73    216 Directed      1

NA值是因为您的示例中df_Nodes缺少此行。

答案 1 :(得分:0)

我们可以matchMap一起使用'df_Edges'中的新列。

df_Edges[c("Source", "Target")] <- Map(function(x,y) df_Nodes$Id[match(x,y)],
                      df_Edges[c("Source", "Target")], list(df_Nodes$Label))
df_Edges
##   Source Target     Type Weight
## 1     73     NA Directed      1
## 2     73    244 Directed      1
## 3     73     73 Directed      1
## 4     73    168 Directed      1
## 5     73    148 Directed      1
## 6     73    216 Directed      1

或者我们可以使用dplyr

library(dplyr)
left_join(df_Edges, df_Nodes, by = c(Target = "Label")) %>% 
          mutate(Target  = Id) %>%
          left_join(., df_Nodes, by = c(Source = "Label")) %>% 
          mutate(Source = Id.y) %>% 
          select(-matches("Id"))
#   Source Target     Type Weight
#1     73     NA Directed      1
#2     73    244 Directed      1
#3     73     73 Directed      1
#4     73    168 Directed      1
#5     73    148 Directed      1
#6     73    216 Directed      1