我有两个大数据表, 带有一栏(完整名称)的df1
D-E
和df2,具有两个列名称和年龄
full.name
brad pitt
shah rukh khan
salman khan
taylor swift
justin bieber
xyz abc
我想要的输出是
name age
brad 10
shah 15
salman khan 20
taylor 30
justin 25
但是我只想按字符串匹配列
到目前为止,我一直在使用full.name name age
brad pitt brad 10
shah rukh khan shah 15
salman khan salman khan 20
taylor swift taylor 30
justin bieber justin 25
,但它适用于完全匹配的值,因此我想按字符串进行匹配
答案 0 :(得分:3)
样本数据
library( data.table )
dt1 <- fread("full.name
brad pitt
shah rukh khan
salman khan
taylor swift
justin bieber
xyz abc", sep = "%")
dt2 <- fread('name, age
brad, 10
shah, 15
salman khan, 20
taylor, 30
justin, 25')
代码
library( fuzzyjoin )
regex_left_join( dt1, dt2, by = c( full.name = "name" ) )
输出
# full.name name age
# 1: brad pitt brad 10
# 2: shah rukh khan shah 15
# 3: salman khan salman khan 20
# 4: taylor swift taylor 30
# 5: justin bieber justin 25
# 6: xyz abc <NA> NA
答案 1 :(得分:0)
对于仅使用data.table
的解决方案,您可以尝试:
df2[, full := lapply(name, function(x) grep(x, df1[, full.name], value = TRUE) )]
要获得内部联接,您可以添加:
df2[lapply(full, length)>0, ]