合并三个数据帧的最佳cas

时间:2019-07-17 12:15:39

标签: r dataframe join

我在连接三个数据帧时遇到问题。我的第一个数据框如下所示:

 id <- c('123','456','789','433','234')
 article1 <- c('111', '222', '333','345','443')
 article2 <- c('111', '333', '223','987','230')
 article3 <- c('234', '552', '897','543','098')
 article4 <- c('231', '322', '341','313','099')
 article5 <- c('242', '222', '222','987','443')

df1 <- data.frame(id, article1,article2,article3,article4,article5)

df1

   id article1 article2 article3 article4 article5
1 123      111      111      234      231      242
2 456      222      333      552      322      222
3 789      333      223      897      341      222
4 433      345      987      543      313      987
5 234      443      230      098      099      443

现在,我有了第二个df,其中包含更多ID列信息。此df有几行用于ID。例如:

id <- c('123','123','789','433','789')
firstname <-c('Paul','Peter', 'Andi', 'Tim', 'Claire')
lastname <-c('P','D', 'A', 'T', 'C')
features <-c('AAB', 'AAC','BBD', 'CCD', 'CDC')

df2 <- data.frame(id, firstname, lastname, features)

df2

   id firstname lastname features
1 123      Paul        P      AAB
2 123     Peter        D      AAC
3 789      Andi        A      BBD
4 433       Tim        T      CCD
5 789    Claire        C      CDC

第三个数据框如下所示,并提供有关文章的信息:

articlenumber <- c('111', '222', '333','443','345','223','234','552')
info <- c('ABC', 'CEF', 'DEF', 'FFF', 'FFD','CCF','LLK','LKO')

df3 <- data.frame(articlenumber, info)

df3

  articlenumber info
1           111  ABC
2           222  CEF
3           333  DEF
4           443  FFF
5           345  FFD
6           223  CCF
7           234  LLK
8           552  LKO

最终结果应如下所示:

   id article1 info article2 info article3 info article4 info article5 info firstname lastname features
1 123 111      ABC  111      ABC  234      LLK  333      DEF  222      CEF Paul P AAB
2 123 111      ABC  111      ABC  234      LLK  333      DEF  222      CEF Peter D AAC    
3 456 222      CEF  333      DEF  552      LKO  111      ABC  222      CEF Andi A BBD
4 789 333      DEF  223      CCF  552      LKO  333      DEF  222      CEF Claire C CDK

对不起,我的表格格式不正确。我希望你明白我想要什么?如果一个以上的人,该行也应该出现不止一次。我已经尝试过合并和联接,但是没有得到结果。

编辑:

使用reduce可以合并df1和df2:

Reduce(function(x,y) merge(x,y,by="id",all=TRUE) ,list(df1,df2))
   id article1 article2 article3 article4 article5 firstname lastname features
1 123      111      111      234      231      242      Paul        P      AAB
2 123      111      111      234      231      242     Peter        D      AAC
3 234      443      230      098      099      443      <NA>     <NA>     <NA>
4 433      345      987      543      313      987       Tim        T      CCD
5 456      222      333      552      322      222      <NA>     <NA>     <NA>
6 789      333      223      897      341      222      Andi        A      BBD
7 789      333      223      897      341      222    Claire        C      CDC

那么如何将df3中的articleinfo放入该df中?

1 个答案:

答案 0 :(得分:1)

您可以像这样从left_join包中使用dplyr:请注意,首先我用stringsAsFactors = F定义了data.frames。否则无法像这样加入他们。

df1 <- data.frame(id = c('123','456','789','433','234'), article1,article2,article3,article4,article5, stringsAsFactors = F)
df2 <- data.frame(id = c('123','123','789','433','789'), firstname, lastname, features, stringsAsFactors = F)
df3 <- data.frame(articlenumber, info, stringsAsFactors = F)

df1 %>% left_join(df2, by = "id") %>%
  left_join(df3 %>% rename(info1 = info), by = c("article1" = "articlenumber")) %>% 
  left_join(df3 %>% rename(info2 = info), by = c("article2" = "articlenumber")) %>% 
  left_join(df3 %>% rename(info3 = info), by = c("article3" = "articlenumber")) %>% 
  left_join(df3 %>% rename(info4 = info), by = c("article4" = "articlenumber")) %>% 
  left_join(df3 %>% rename(info5 = info), by = c("article5" = "articlenumber")) %>%
  select(id, article1, info1, article2, info2, article3, info3, article4, info4, 
         article5, info5, everything())

   id article1 info1 article2 info2 article3 info3 article4 info4 article5 info5 firstname lastname features
1 123      111   ABC      111   ABC      234   LLK      231  <NA>      242  <NA>      Paul        P      AAB
2 123      111   ABC      111   ABC      234   LLK      231  <NA>      242  <NA>     Peter        D      AAC
3 456      222   CEF      333   DEF      552   LKO      322  <NA>      222   CEF      <NA>     <NA>     <NA>
4 789      333   DEF      223   CCF      897  <NA>      341  <NA>      222   CEF      Andi        A      BBD
5 789      333   DEF      223   CCF      897  <NA>      341  <NA>      222   CEF    Claire        C      CDC
6 433      345   FFD      987  <NA>      543  <NA>      313  <NA>      987  <NA>       Tim        T      CCD
7 234      443   FFF      230  <NA>      098  <NA>      099  <NA>      443   FFF      <NA>     <NA>     <NA>