当ignore.case = TRUE时,gsubfn函数没有给出所需的输出

时间:2017-10-03 18:09:35

标签: r replace gsub case-sensitive

我试图用字符向量中的多个模式替换相应的替换字符串。在做了一些研究之后,我发现了gsubfn包,我认为它可以做我想要的,但是当我运行下面的代码时,我没有得到我预期的输出(请参阅结尾的问题结果与我预期的结果)查看)。

library(gsubfn)

# Our test data that we want to search through (while ignoring case)

test.data<- c("1700 Happy Pl","155 Sad BLVD","82 Lolly ln", "4132 Avent aVe")

#     A list data frame which contains the patterns we want to search for 
#     (again ignoring case) and the associated replacement strings we want to 
#     exchange any matches we come across with.


frame<- data.frame(pattern= c(" Pl"," blvd"," LN"," ave"), replace= c(" Place", " Boulevard", " Lane", " Avenue"),stringsAsFactors = F)

# NOTE: I added spaces in front of each of our replacement terms to make
#       sure we only grab matches that are their own word (for instance if an
#       address was 45 Splash Way we would not want to replace "pl" inside of 
#       "Splash" with "Place

#     The following set of paste lines are supposed to eliminate the substitute function from
#     grabbing instances like first instance of " Ave" found directly after "4132" 
#     inside "4132 Avent Ave" which we don't want converted to " Avenue".

pat <- paste(paste(frame$pattern,collapse = "($|[^a-zA-Z])|"),"($|[^a-zA-Z])", sep = "")

#     Here is the gsubfn function I am calling
gsubfn(x = test.data, pattern = pat, replacement = setNames(as.list(frame$replace),frame$pattern), ignore.case = T)

收到的输出:

[1] "1700 Happy" "155 Sad"    "82 Lolly"   "4132 Avent"

预期输出:

[1] "1700 Happy Place" "155 Sad Boulevard" "82 Lolly Lane" "4132 Avent Avenue"

我关于为什么不起作用的工作理论是,由于某些情况不一致(例如,匹配与我传递到gsubfn的替换参数中的列表相关联的名称不匹配) :匹配被发现在&#34; 155悲伤的BLVD&#34;不会= =&#34; blvd&#34;尽管由于ignore.case参数它可以被视为匹配) 。有人可以确认这是问题/指出我可能出现的其他问题,也许是一种解决这个问题的方法,如果可能的话,我不需要扩展我的模式向量以包含所有案例排列吗?

1 个答案:

答案 0 :(得分:0)

似乎stringr有一个简单的解决方案:

library(stringr)

str_replace_all(test.data, 
                regex(paste0('\\b',frame$pattern,'$'),ignore_case = T),
                frame$replace)
#[1] "1700 Happy Place"  "155 Sad Boulevard" "82 Lolly Lane"     "4132 Avent Avenue"

请注意,由于棘手的'Avent aVe',我不得不改变正则表达式以仅查找字符串末尾的单词。但当然还有其他方法可以解决这个问题。