我试图用字符向量中的多个模式替换相应的替换字符串。在做了一些研究之后,我发现了gsubfn包,我认为它可以做我想要的,但是当我运行下面的代码时,我没有得到我预期的输出(请参阅结尾的问题结果与我预期的结果)查看)。
library(gsubfn)
# Our test data that we want to search through (while ignoring case)
test.data<- c("1700 Happy Pl","155 Sad BLVD","82 Lolly ln", "4132 Avent aVe")
# A list data frame which contains the patterns we want to search for
# (again ignoring case) and the associated replacement strings we want to
# exchange any matches we come across with.
frame<- data.frame(pattern= c(" Pl"," blvd"," LN"," ave"), replace= c(" Place", " Boulevard", " Lane", " Avenue"),stringsAsFactors = F)
# NOTE: I added spaces in front of each of our replacement terms to make
# sure we only grab matches that are their own word (for instance if an
# address was 45 Splash Way we would not want to replace "pl" inside of
# "Splash" with "Place
# The following set of paste lines are supposed to eliminate the substitute function from
# grabbing instances like first instance of " Ave" found directly after "4132"
# inside "4132 Avent Ave" which we don't want converted to " Avenue".
pat <- paste(paste(frame$pattern,collapse = "($|[^a-zA-Z])|"),"($|[^a-zA-Z])", sep = "")
# Here is the gsubfn function I am calling
gsubfn(x = test.data, pattern = pat, replacement = setNames(as.list(frame$replace),frame$pattern), ignore.case = T)
收到的输出:
[1] "1700 Happy" "155 Sad" "82 Lolly" "4132 Avent"
预期输出:
[1] "1700 Happy Place" "155 Sad Boulevard" "82 Lolly Lane" "4132 Avent Avenue"
我关于为什么不起作用的工作理论是,由于某些情况不一致(例如,匹配与我传递到gsubfn的替换参数中的列表相关联的名称不匹配) :匹配被发现在&#34; 155悲伤的BLVD&#34;不会= =&#34; blvd&#34;尽管由于ignore.case参数它可以被视为匹配) 。有人可以确认这是问题/指出我可能出现的其他问题,也许是一种解决这个问题的方法,如果可能的话,我不需要扩展我的模式向量以包含所有案例排列吗?
答案 0 :(得分:0)
似乎stringr
有一个简单的解决方案:
library(stringr)
str_replace_all(test.data,
regex(paste0('\\b',frame$pattern,'$'),ignore_case = T),
frame$replace)
#[1] "1700 Happy Place" "155 Sad Boulevard" "82 Lolly Lane" "4132 Avent Avenue"
请注意,由于棘手的'Avent aVe',我不得不改变正则表达式以仅查找字符串末尾的单词。但当然还有其他方法可以解决这个问题。