test.vector <- c("jdoe","John Doe","jodoe","Sarah Scarlet","sscarlet","scarlet")
test.df <- data.frame("Full.Name" = c("John Doe","Sarah Scarlet"),
"alias1" = c("jdoe","sscarlet"),
"alias2" = c("jodoe","scarlet"))
want.vector <- c("John Doe","John Doe","John Doe","Sarah Scarlet","Sarah Scarlet","Sarah Scarlet")
> test.vector
[1] "jdoe" "John Doe" "jodoe" "Sarah Scarlet" "sscarlet" "scarlet"
> test.df
Full.Name alias1 alias2
1 John Doe jdoe jodoe
2 Sarah Scarlet sscarlet scarlet
> want.vector
[1] "John Doe" "John Doe" "John Doe" "Sarah Scarlet" "Sarah Scarlet" "Sarah Scarlet"
像one这样的所有搜索结果都只有一个匹配项,并且使用了merge()
或join()
。
但是,在这种情况下,有多种可能性,我不确定如何处理。
我尝试过的东西很少(使用屠宰语法):
str_replace(test.vector,test.df[,-1],test.df[.1])
recode(test.vector,test.df)
by = c(test.df[,-1], test.vector)
需要注意的一点是,我为项目拥有的实际 test.df
有多个非常稀疏的列(因为每个别名都与特定的位置/位置相关)。不确定是否会与上面的示例产生显着差异。
答案 0 :(得分:1)
您可以创建一个与数据框相同的 dim
值数组,并让第一列循环使用,然后循环遍历测试向量以通过 sapply
中的数据框对数组进行子集。< /p>
test.a <- array(test.df[, 1], dim=dim(test.df))
sapply(test.vector, function(x) test.a[x == test.df], USE.NAMES=F)
# [1] "John Doe" "John Doe" "John Doe" "Sarah Scarlet" "Sarah Scarlet"
# [6] "Sarah Scarlet"