用字符串部分匹配不同数据表的两列

时间:2019-03-18 08:32:43

标签: r merge

我有两个大数据表, 带有一栏(完整名称)的df1

D-E

和df2,具有两个列名称和年龄

full.name  
brad pitt
shah rukh khan       
salman khan
taylor swift
justin bieber
xyz abc

我想要的输出是

name         age
brad         10
shah         15
salman khan  20
taylor       30
justin       25

但是我只想按字符串匹配列 到目前为止,我一直在使用full.name name age brad pitt brad 10 shah rukh khan shah 15 salman khan salman khan 20 taylor swift taylor 30 justin bieber justin 25 ,但它适用于完全匹配的值,因此我想按字符串进行匹配

2 个答案:

答案 0 :(得分:3)

样本数据

library( data.table )

dt1 <- fread("full.name
brad pitt
             shah rukh khan       
             salman khan
             taylor swift
             justin bieber
             xyz abc", sep = "%")

dt2 <- fread('name,         age
brad,         10
shah,         15
salman khan,  20
taylor,       30
justin,       25')

代码

library( fuzzyjoin )
regex_left_join( dt1, dt2, by = c( full.name = "name" ) )

输出

#         full.name        name age
# 1:      brad pitt        brad  10
# 2: shah rukh khan        shah  15
# 3:    salman khan salman khan  20
# 4:   taylor swift      taylor  30
# 5:  justin bieber      justin  25
# 6:        xyz abc        <NA>  NA

答案 1 :(得分:0)

对于仅使用data.table的解决方案,您可以尝试:

df2[, full := lapply(name, function(x) grep(x, df1[, full.name], value = TRUE) )]

要获得内部联接,您可以添加:

df2[lapply(full, length)>0, ]