我有以下与匹配不同数据框相关的问题。
首先,我有下一张桌子:
table<-data.frame(brand=c('duna','cars','cars','sea','sea','sea','mega','moon','moon'),model=c('mm','mm','mm','ll','ll','ll','tr','tr','tr'),version=c("2.8 sr cab. dupla 4x4 tdi","2.0 lsdakar 16v 4x4 hi-flex 5-p","2.4 ls cab. simples 4x2 flex 2-p","2.3 xl cab. simples 4x2 2-p","1.8 sx 5-p","1.0 mpfi joy 8v","hatch ls 1.0 8v","2.3 xlt cab. dupla 4x2 limited 4-p","1.4 fire ce xlt flex 2-p"),year=c(2014,2015,2016,2014,2012,2011,2013,2013,2012))
table$id<-paste0(table$brand,table$model)
table$id2<-paste0(table$brand,table$model,table$year)
brand model version year id id2
1 duna mm 2.8 sr cab. dupla 4x4 tdi 2014 dunamm dunamm2014
2 cars mm 2.0 lsdakar 16v 4x4 hi-flex 5-p 2015 carsmm carsmm2015
3 cars mm 2.4 ls cab. simples 4x2 flex 2-p 2016 carsmm carsmm2016
4 sea ll 2.3 xl cab. simples 4x2 2-p 2014 seall seall2014
5 sea ll 1.8 sx 5-p 2012 seall seall2012
6 sea ll 1.0 mpfi joy 8v 2011 seall seall2011
7 mega tr hatch ls 1.0 8v 2013 megatr megatr2013
8 moon tr 2.3 xlt cab. dupla 4x2 limited 4-p 2013 moontr moontr2013
9 moon tr 1.4 fire ce xlt flex 2-p 2012 moontr moontr2012
table_match<-data.frame(brand=c('duna','cars','sea','mega','moon'),model=c('mm','mm','ll','tr','tr'),version=c('tdi','ls','xl','ls','xlt'),year=c(2014,2015,2014,2014,2015))
table_match$id<-paste0(table_match$brand,table_match$model)
table_match$id2<-paste0(table_match$brand,table_match$model,table_match$year)
brand model version year id id2
1 duna mm tdi 2014 dunamm dunamm2014
2 cars mm ls 2015 carsmm carsmm2015
3 sea ll xl 2014 seall seall2014
4 mega tr ls 2014 megatr megatr2014
5 moon tr xlt 2015 moontr moontr2015
如何仅对同一数据(table
和table_match
)应用一个模糊联接,而不是重复使用该函数两次?
这个想法是只创建一个数据帧,组合每个单独的模糊连接的结果。
示例unique(rbind(match,match_1))
考虑到在实例中我必须复制模糊连接至少五次(我有不同级别的ID)。
match<-fuzzy_join(table, table_match,
by = c("id", "version"),
match_fun = c(`==`, function(x,y) { str_detect(x, paste0("\\b", y, "\\b" ))})) %>%
unique()
brand.x model.x version.x year.x id.x id2.x brand.y model.y version.y year.y id.y id2.y
1 duna mm 2.8 sr cab. dupla 4x4 tdi 2014 dunamm dunamm2014 duna mm tdi 2014 dunamm dunamm2014
2 cars mm 2.4 ls cab. simples 4x2 flex 2-p 2016 carsmm carsmm2016 cars mm ls 2015 carsmm carsmm2015
3 sea ll 2.3 xl cab. simples 4x2 2-p 2014 seall seall2014 sea ll xl 2014 seall seall2014
4 mega tr hatch ls 1.0 8v 2013 megatr megatr2013 mega tr ls 2014 megatr megatr2014
5 moon tr 2.3 xlt cab. dupla 4x2 limited 4-p 2013 moontr moontr2013 moon tr xlt 2015 moontr moontr2015
6 moon tr 1.4 fire ce xlt flex 2-p 2012 moontr moontr2012 moon tr xlt 2015 moontr moontr2015
match_1<-fuzzy_join(table, table_match,
by = c("id2", "version"),
match_fun = c(`==`, function(x,y) { str_detect(x, paste0("\\b", y, "\\b" ))})) %>%
unique()
brand.x model.x version.x year.x id.x id2.x brand.y model.y version.y year.y id.y id2.y
1 duna mm 2.8 sr cab. dupla 4x4 tdi 2014 dunamm dunamm2014 duna mm tdi 2014 dunamm dunamm2014
2 sea ll 2.3 xl cab. simples 4x2 2-p 2014 seall seall2014 sea ll xl 2014 seall seall2014