如何将多个模糊连接应用于同一数据帧

时间:2017-08-17 13:16:14

标签: r join dataframe fuzzyjoin

我有以下与匹配不同数据框相关的问题。

首先,我有下一张桌子:

table<-data.frame(brand=c('duna','cars','cars','sea','sea','sea','mega','moon','moon'),model=c('mm','mm','mm','ll','ll','ll','tr','tr','tr'),version=c("2.8 sr cab. dupla 4x4 tdi","2.0 lsdakar 16v 4x4 hi-flex 5-p","2.4 ls cab. simples 4x2 flex 2-p","2.3 xl cab. simples 4x2  2-p","1.8 sx  5-p","1.0 mpfi joy 8v","hatch ls 1.0 8v","2.3 xlt cab. dupla 4x2 limited 4-p","1.4 fire ce xlt flex 2-p"),year=c(2014,2015,2016,2014,2012,2011,2013,2013,2012))

table$id<-paste0(table$brand,table$model)

table$id2<-paste0(table$brand,table$model,table$year)

  brand model                            version year     id        id2
1  duna    mm          2.8 sr cab. dupla 4x4 tdi 2014 dunamm dunamm2014
2  cars    mm    2.0 lsdakar 16v 4x4 hi-flex 5-p 2015 carsmm carsmm2015
3  cars    mm   2.4 ls cab. simples 4x2 flex 2-p 2016 carsmm carsmm2016
4   sea    ll       2.3 xl cab. simples 4x2  2-p 2014  seall  seall2014
5   sea    ll                        1.8 sx  5-p 2012  seall  seall2012
6   sea    ll                    1.0 mpfi joy 8v 2011  seall  seall2011
7  mega    tr                    hatch ls 1.0 8v 2013 megatr megatr2013
8  moon    tr 2.3 xlt cab. dupla 4x2 limited 4-p 2013 moontr moontr2013
9  moon    tr           1.4 fire ce xlt flex 2-p 2012 moontr moontr2012

table_match<-data.frame(brand=c('duna','cars','sea','mega','moon'),model=c('mm','mm','ll','tr','tr'),version=c('tdi','ls','xl','ls','xlt'),year=c(2014,2015,2014,2014,2015))

table_match$id<-paste0(table_match$brand,table_match$model)
table_match$id2<-paste0(table_match$brand,table_match$model,table_match$year)

  brand model version year     id        id2
1  duna    mm     tdi 2014 dunamm dunamm2014
2  cars    mm      ls 2015 carsmm carsmm2015
3   sea    ll      xl 2014  seall  seall2014
4  mega    tr      ls 2014 megatr megatr2014
5  moon    tr     xlt 2015 moontr moontr2015

如何仅对同一数据(tabletable_match)应用一个模糊联接,而不是重复使用该函数两次? 这个想法是只创建一个数据帧,组合每个单独的模糊连接的结果。 示例unique(rbind(match,match_1))

考虑到在实例中我必须复制模糊连接至少五次(我有不同级别的ID)。

match<-fuzzy_join(table, table_match, 
                            by = c("id", "version"), 
                            match_fun = c(`==`, function(x,y) { str_detect(x, paste0("\\b", y, "\\b" ))})) %>%
 unique()


  brand.x model.x                          version.x year.x   id.x      id2.x brand.y model.y version.y year.y   id.y      id2.y
1    duna      mm          2.8 sr cab. dupla 4x4 tdi   2014 dunamm dunamm2014    duna      mm       tdi   2014 dunamm dunamm2014
2    cars      mm   2.4 ls cab. simples 4x2 flex 2-p   2016 carsmm carsmm2016    cars      mm        ls   2015 carsmm carsmm2015
3     sea      ll       2.3 xl cab. simples 4x2  2-p   2014  seall  seall2014     sea      ll        xl   2014  seall  seall2014
4    mega      tr                    hatch ls 1.0 8v   2013 megatr megatr2013    mega      tr        ls   2014 megatr megatr2014
5    moon      tr 2.3 xlt cab. dupla 4x2 limited 4-p   2013 moontr moontr2013    moon      tr       xlt   2015 moontr moontr2015
6    moon      tr           1.4 fire ce xlt flex 2-p   2012 moontr moontr2012    moon      tr       xlt   2015 moontr moontr2015

match_1<-fuzzy_join(table, table_match, 
                                by = c("id2", "version"), 
                                match_fun = c(`==`, function(x,y) { str_detect(x, paste0("\\b", y, "\\b" ))})) %>%
     unique()

  brand.x model.x                    version.x year.x   id.x      id2.x brand.y model.y version.y year.y   id.y      id2.y
1    duna      mm    2.8 sr cab. dupla 4x4 tdi   2014 dunamm dunamm2014    duna      mm       tdi   2014 dunamm dunamm2014
2     sea      ll 2.3 xl cab. simples 4x2  2-p   2014  seall  seall2014     sea      ll        xl   2014  seall  seall2014

0 个答案:

没有答案