使用具有多个匹配的另一个数据帧替换字符值

时间:2021-03-01 16:04:41

标签: r join replace stringr

test.vector <- c("jdoe","John Doe","jodoe","Sarah Scarlet","sscarlet","scarlet")
test.df <- data.frame("Full.Name" = c("John Doe","Sarah Scarlet"),
                      "alias1" = c("jdoe","sscarlet"),
                      "alias2" = c("jodoe","scarlet"))
want.vector <- c("John Doe","John Doe","John Doe","Sarah Scarlet","Sarah Scarlet","Sarah Scarlet")

> test.vector
[1] "jdoe"          "John Doe"      "jodoe"         "Sarah Scarlet" "sscarlet"      "scarlet" 

> test.df
      Full.Name   alias1  alias2
1      John Doe     jdoe   jodoe
2 Sarah Scarlet sscarlet scarlet     

> want.vector
[1] "John Doe"      "John Doe"      "John Doe"      "Sarah Scarlet" "Sarah Scarlet" "Sarah Scarlet"

one这样的所有搜索结果都只有一个匹配项,并且使用了merge()join()。 但是,在这种情况下,有多种可能性,我不确定如何处理。 我尝试过的东西很少(使用屠宰语法):

  1. str_replace(test.vector,test.df[,-1],test.df[.1])
  2. recode(test.vector,test.df)
  3. 将 test.vector 改为 df 后加入 by = c(test.df[,-1], test.vector)

需要注意的一点是,我为项目拥有的实际 test.df 有多个非常稀疏的列(因为每个别名都与特定的位置/位置相关)。不确定是否会与上面的示例产生显着差异。

1 个答案:

答案 0 :(得分:1)

您可以创建一个与数据框相同的 dim 值数组,并让第一列循环使用,然后循环遍历测试向量以通过 sapply 中的数据框对数组进行子集。< /p>

test.a <- array(test.df[, 1], dim=dim(test.df))
sapply(test.vector, function(x) test.a[x == test.df], USE.NAMES=F)
# [1] "John Doe"      "John Doe"      "John Doe"      "Sarah Scarlet" "Sarah Scarlet"
# [6] "Sarah Scarlet"